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Bayesian networks provide an elegant formalism for representing and reasoning 
about uncertainty using probability theory. They are a probabilistic extension of 
propositional logic and, hence, inherit some of the limitations of propositional logic, 
such as the difficulties to represent objects and relations. We introduce a generaliza- 
tion of Bayesian networks, called Bayesian logic programs, to overcome these limita- 
tions. In order to represent objects and relations it combines Bayesian networks with 
definite clause logic by establishing a one-to-one mapping between ground atoms and 
random variables. We show that Bayesian logic programs combine the advantages 
of both definite clause logic and Bayesian networks. This includes the separation 
of quantitative and qualitative aspects of the model. Furthermore, Bayesian logic 
programs generalize both Bayesian networks as well as logic programs. So, many 
ideas developed in both areas carry over. 
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1. Introduction 

Bayesian networks [Pea91] are one of the most important, efficient and el- 
egant frameworks for representing and reasoning with probabilistic models. A 
single Bayesian network specifies a joint probability density over a finite set of 
random variables and consists of two components: (1) a qualitative one that en- 
codes the local influences among the random variables using a directed acyclic 
graph, and (2) a quantitative one that encodes the probability densities over these 
local influences. Bayesian networks have been applied to many real-world prob- 
lems in diagnosis, forecasting, automated vision, sensor fusion and manufacturing 
control (cf. see the articles [HMW95,BH95,FF95,HBR95] which form together a 
special issue of the Communications of the ACM). 

However, Bayesian networks are a probabilistic extension of propositional 
logic. The limitations of propositional logic, which Bayesian networks inherit, are 
well-known, see e.g. [Poo93,NH97,Jae97,FL98,Kol99]: they have a rigid structure 



2 



and therefore have problems representing a variable number of objects or relations 
among objects. Consider e.g. building a probabilistic model of a class of computer 
networks with Bayesian networks. This is problematic because the complex and 
dynamic structure of computer networks, and the relations among their differ- 
ent components, cannot elegantly be modeled using Bayesian networks. Indeed, 
it is quite likely that the structure of different networks is at an abstract level 
quite similar. However, using Bayesian networks each computer network would 
need to be modeled by its own specific Bayesian network. There is no way of 
formulating general probabilistic regularities for all the computer networks. Fur- 
thermore, whenever components are added or deleted to a computer network its 
corresponding Bayesian network should be modified. This in turn would lead to 
exponential updating problems. 

The above sketched problems are due to the propositional nature of Bayesian 
networks. They would disappear when using a first order formalism. It is there- 
fore no surprise that various researchers have proposed first order extensions 
of Bayesian networks, e.g. [Poo93,NH97,Jae97,Kol99]. Many of these techniques 
employ the notion of knowledge-based model construction [BGW94,Had99], where 
first-order rules with associated uncertainty parameters are used to generate spe- 
cific Bayesian networks for particular queries. This is especially useful in domains 
where the number of relevant random variables depends on the specific problem, 
such as computer networks, pedigree analysis, etc. 

The main contribution of this paper is the introduction of Bayesian logic 
programs. Bayesian logic programs combine Bayesian networks with definite 
clause logic, i.e. "pure" Prolog. Therefore, Bayesian logic programs are easy 
to understand and use by practitioners in both communities. Indeed, both 
Bayesian networks and "pure" Prolog programs can naturally be represented 
using Bayesian logic programs. Bayesian logic programs can also handle do- 
mains involving structured terms as well as continuous random variables, which 
is not the case with the earlier proposals. Furthermore, whereas the approaches 
of [Poo93,NH97,Jae97,Kol99] view a ground atom as a state of a random variable 
Bayesian logic programs view them as random variables. More precisely, we es- 
tablish a one-to-one mapping between ground atoms and random variables. This 
results — as for Bayesian networks — in a strict separation of quantitative and 
qualitative components of the representation, which is considered one of the key 
features of Bayesian networks. 

We proceed as follows. Section 2 presents our example domain and reviews 
the basics of probability theory and definite clause logic. Section 3 motivates 
and introduces the representation language of Bayesian logic programs. Their 
declarative semantics are given in Section 4, and in Section 5 we discuss a query- 
answering procedure. Section 6 analysis the connection of the query-answering 
procedure to SLD trees. In Section 7 we investigate the representational power of 
Bayesian logic programs by showing how they can represent Bayesian networks, 
definite clause programs and dynamic Bayesian networks a.o. We discuss related 



3 



work in Section 8 and conclude the paper in Section 9. The appendix contains the 
proofs of some theorems as well as simple Prolog shell for interpreting Bayesian 
logic programs. 

2. Preliminaries 

Bayesian logic programs combine Bayesian networks with definite clause 
logic. Throughout the paper, we assume some familiarity with logic programming 
or Prolog (see e.g. [SS86,Bra86,Fla94]) as well as with Bayesian networks (see e.g. 
[RN95,Nil98,Pea91,CDLS99]). We will now briefly review the key concepts and 
ideas underlying Bayesian networks and definite clause logic. Before doing so, we 
provide a motivating example. 

2.1. A motivating example: genetics 

Throughout the paper we will employ an example from the field of genetics 
to illustrate Bayesian logic programs. Genetics provide an intuitive and natural 
application domain for first order probabilistic models, because it 

• has a probabilistic nature given by the biological laws of inheritance, and 

• requires the representation of the relational familial structure of the individuals 
under study. 

As for the computer networks example, the genetic and probabilistic regularities 
hold across different families though their structure may be different. Relational 
or first order representations can be used to capture the qualitative aspects. A 
second reason for using an illustration from the field of genetics for the purposes 
of this paper, is that genetics studies the inheritance of both continuous and 
discrete phenotypes such as weight and blood type. An individual's phenotype 
is an observable characteristic, whose distribution is affected by an individual's 
heritable information in conjunction with environmental factors. 

The subfield of genetics which investigates continuous phenotypes is called 
quantitative genetics 1 , cf. [WJSR80,Fal81,Tho86]. For a range of models within 
quantitative genetics "each indiviudal has a polygenic value, or polygenotype, 
which in the population is normally distributed 1 [Tho86, Section 6.2]. These mod- 
els are called polygenic models. The reason for that is that the effects of some 
genes on a phenotype seem to be additive. I.e. each gene independently effects 
additive changes of the phenotype. If equal effects are assumed, then the phe- 
notype value is normally distributed as illustrated in Figure 1. Here, the genetic 
information aa has no effect on the phenotype and AA has an additive change 

1 According to Falconer [Fal81, page 2], parts of the theoretical basis of quantitative genetics 
was established by the geneticist Sewal Wright, the same person to whom the idea of using 
graphical representations of probabilistic information can be traced back [Pea91]. 
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Figure 1. The effects of one, two and four different genes underlying a phenotype, e.g. height: 
when the number of underlying genes increases, the values of the phenotype tend to be normally 
distributed. The numbers associated to the boxes in the two left histograms are values of the 
phenotype. It is assumes that each gene independently but equally effects additive changes of 
the phenotype. Lower cases stand for no effect, and upper cases for effect. 



of 2. Thus, the distribution of the phenotype is given by counting how often 
a particular additve effect might occur. We will use a simplified model for the 
polygentic value height as running example throughout the paper. Let M(x, fi, S) 
denote a Gaussian density with mean /j, and Variance S: 



Af(x, /i, S) := 




Assume that the height of an individual has apriori a normal density with mean 
175 [cm] and variance 60 [cm 2 ], i.e. p(Height = h) = N(h, 175, 60). Its aposte- 
riori density depends on the heights of the individual's parents, HeighLM and 
Height_F: 

p(Height = h \ HeightJA = m, Height_F = f) = M{h, -(m + /), 60) (2.1) 

The densities are visualized in Figure 2. Questions one could be interested in are 
e.g.: "what is the expected height of a person given that her grandfather's height 
is 182 cm ?" or "what is the expected height of a person given that her great 
granddaughter's height is 161 cm ?" 



2.2. Bayesian networks 



In the discussion of Bayesian networks we will use X to denote a random 
variable, x a state and X (resp. x) a vector of variables (resp. values). Throughout 
the paper, the set D(X) of all states of random variables X is called the domain 
of X. The set -D(X) := (8)xex^(^)) where denotes the Cartesian product, 
is the domain, the set of joint states of the random variables X. We will focus on 
real random variables of which discrete and finite random variables are special 
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Figure 2. Example densities of the model of height inheritance. (1) The apriori density p(Height = 
h) of a person's height is a normal density with mean 175 and variance 60, p(Height — h) — 
Af(h, 175, 60). (2) The aposteriori density given that the heights m, f of the parents M,F are 
164 and 173, p(Height = h \ Height Jvl = m, Eeight F = f) = N{h, 168.5, 60). 



cases 2 . A (real) random variable can take values in the continuum, i.e. D(X) = R, 
and we can only talk about the probability 

,6 

P(X€[a,b))= / p{X = x)dx (2.2) 

J a 

that the state of X falls in some interval [a, b). The function p : K w K is a 
probability density, i.e. p(X = x) > for all x G D(X) and f^™p(X = x) dx = 1. 
We will use the normal letter p, e.g. p(X = x), to denote that X takes the value 
x £ D(X), and the bold letter p to denote a probability density, e.g. p(X). We 
will use the bold letter P to denote a probability measure, e.g. P(X). Given two 
real random variables X, Y the conditional probability density of X given Y is 
defined as p(X | Y) = -The marginalized probability density of X given 

p(X,Y) is p(X) = /+^p(X,Y = y)dy.We denote the cardinality of a set S 
with 

A Bayesian network [Pea91,CDLS99] represents the joint probability density 
p(Xi, . . . , X n ) (and due to equation (2.2) a probability measure) over a fixed, 
finite set {X\, . . . ,X n } of random variables. It is an augmented, acyclic graph, 
where each node corresponds to a random variable Xj (we will not distinguish 
between the random variables and the nodes of the graph) and each edge indicates 
a direct influence among the random variables. Figure 3 shows the graph of 
a Bayesian network modelling our height example for a particular family. The 
familial relationship, which is taken from the stud farm example in [Jen96], forms 
the basis for the graph. The network states e.g. that Irene's height is influenced 
by the heights of its parents Gwenn and Eric. The domain of each JQ, the set 

2 Generalizations to more 'complex' random variables such as d-dimensional real random vari- 
ables could easily be obtained modulo the constraints well-known from probability theory (see 
e.g. [Bau91]). 



Figure 3. The graphical structure of a Bayesian network modelling the inheritance of height 
within a particular family. The familial relationship is taken from Jensen's stud farm exam- 
ple [Jen96, page 38]. 

of possible states of Xj is DpQ) = R. If we use a wildcard Y for the individuals 
ann,fred, . . . this could be written D(Height(Y)) = R. The direct predecessors of 
a node X, the parents of X are denoted by Pa(X). A Bayesian network stipulates 
a conditional independency assumption: 

Assumption 2.1 (independency). Each node X; L in the graph is conditionally 
independent of any subset A of nodes that are not descendants of X; L given a 
joint state of Pa(X), i.e. p(X | A,Pa(X)) = p(X | Pa(X;)). 

E.g. Height(irene) is conditionally independent of Height(ann) given 
a joint state of its parents {Height(gwenn), Height(eric)}. Any pair 
(.Xj, PapQ)) is called the family of Xj, e.g. Height(ireneYs family is 
(Height(irene),{Height(gwenn), Height(eric)}). Given the conditional inde- 
pendence assumption, we can write down the joint probability density as 

n 

p(Xi, ,X n ) = I] p(X | Pa(X)) (2.3) 
i=i 

by applying the independency assumption 2.1 to the chain rule expression of the 
joint probability density. Thereby, we associate to each node Aj of the graph 
the conditional probability densities p(Xj | Pa(Aj)), denoted as cpd(Xi). The 
function cpd(Xi) specifies for each u G D(Pa(A"j)) and for each u G D(A«) the 
conditional density value p(A« = u \ Pa(Xj) = u), denoted as cpd(Xi)(u \ u). For 
our height example we set according to Equation (2.1): 

cpd(Xi)(u) = M(u, 175, 60) , (2.4) 

if Pa(-Xi) = {}, and 

cpd{Xi){u | u f ,u m ) = Af(u, ^(u m + u/),60) (2.5) 

otherwise, where u m and Uf are the heights of X^s mother and father. The 
inference problem for a Bayesian network could be stated as follows: 
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Definition 2.2 (inference problem). Given a Bayesian network over 
{Xi,...,X n }, a joint state u of Y C {X±,...,X n } and a set of query 
variables Q C {Xi, . . . ,X n }, find the conditional density p(Q | Y = u). 

In our height example, one might e.g. query for p(Height(john)) or 
p(Height(john) \ Height(ann) = 165). The answers would be p(Heigth(john) = 
h) = J\f(h, 175, 112.45) and p(Height(john = h) \ Height(ann) = 165) = 
J\f(h, 171.25, 111.56) 3 . 

The network in Figure 3 is an example of a Gaussian network, where all 
associated densities are Gaussian. In discrete networks the random variables 
are discrete, and conditional Gaussian (CG) networks (cf. [CDLS99]) are net- 
works involving both discrete and continuous variables. In CG networks every 
discrete variable only depends on discrete variables, and continuous variables 
follow a multivariate Gaussian distribution given the discrete. In general, exact 
solutions of the inference problem are not possible when arbitrary conditional 
densities are employed. This is why CG networks are important. They consti- 
tute an analytically tractable model involving continuous and discrete variables. 
Many exact algorithms solving the inference problem for the tractable cases ex- 
ist. Their details are not relevant for the present paper and we refer to literature 
(e.g. [Pea91,RN95,CDLS99]). Furthermore, approximate solutions of models in- 
volving arbitrary conditional densities can always be computed using stochastic 
techniques such as Gibbs sampling (see e.g. [JKK95]). 

2.3. Definite Clause Logic 

Imagine another family totally separated from the described one. A similar 
Bayesian network would model the height example within that family: its graph- 
ical structure and associated conditional probability distribution are controlled 
by the same intensional regularities. Definite clause logic is a classical framework 
for representing such (logical) intensional regularities. 

A first-order alphabet is a set of predicate symbols and a set of functor 
symbols. Constants are functor symbols of arity 0. We assume that at least one 
constant is given. A definite clause is a formula of the form A <— Bi, . . . , B m where 
A and the Bj are logical atoms. An atoui v(t\ } . . . , t n ) is a predicate symbol p 
followed by a bracketed n-tuple of terms U. A term T is a variable V or a functor 
symbol /(ii,...,ifc) immediately followed by a bracked n-tuple of term ij. A 
definite clause can be read as A if B\ and . . . and B m . All variables in definite 
clauses are universally quantified, although this is not explicitly written. We call 
A the head and B±, . . . , B m the body of the clause. A definite clause program is 
a set of definite clauses. Consider e.g. the program height given in Figure 2.3. 
The motivation for this particular choice of program will become clear soon. 
Its alphabet consists of the predicate symbol height and the function symbols 

The answers are computed using the Hugin expert system [Huga] . 
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height(ann) <— 
height( cecily) <— 
height(brian) <— 
height(unknownl) <— 
height(unknown2) <— 

height(dorothy) <— height(ann), height(brian) 
height(eric) <— height(cecily) , height(brian) 
height(fred) <— height(ann), height(unknownl) 
height(gwenn) <— height(ann), height{unknown2) 
height(henry) <— height(dorothy) , height(fred) 
height(irene) <— height(eric) , height(gwenn) 
height(john) <— height(henry) , height(irene) 

Figure 4. The definite clause program height. 

unknown l,unknown2, ann,brian, cecily, fred, .. . of arity 0. The first six clauses 
are called /acts, because they have empty bodies. 

The set of variables in a term, atom or clause E 1 , is denoted as vars(E). 
A clause is called range-restricted, if all variables occurring in the head of a 
clause also occur in its body, i.e. vars(head(E)) C vars{head{E)). All clauses of 
our example program are range-restricted. The following clause C is not range- 
restricted 

height(X) <— heightiY). 

Functor-free clauses are clauses that contain only variables and constants as 
terms. A goal is a formula of the form <— B\, . . . , B m . A substitution = 
{ Vi/ti, V n /t n }, e.g. {X/onn}, is an assignment of terms t{ to variables Vi. 
Applying a substitution 9 to a term, atom or clause e yields the instantiated 
term, atom, or clause e6 where all occurrences of the variables Vi are simulta- 
neously replaced by the term ti, e.g. C{X/ann} yields height(ann) <— height(Y) 
with head(C){X / ann} = height(ann). A substitution 9 is called a unifier for a 
finite set S of atoms if S9 is singleton. A unifier 9 for S is called a most general 
unifier (MGU) for S if, for each unifier a of S, there exists a substitution 7 such 
that a = 6*7. The substitution {A/ann} is the MGU of {height(ann) , height(X)} . 
A term, atom or clause E is called ground when there is no variable occurring in 
E, i.e. vars(E) = 0. All clauses in our example program height are ground. 

The Herbrand base of a definite clause program T, denoted as HB(T), is the 
set of all ground atoms constructed with the predicate and functor symbols in 
the first-order alphabet of T. A Herbrand interpretation is a subset of HB(T). 
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The least Herbrand model LH(T) of a definite clause program is defined as the 
set of all facts / G HB(T) such that T logically entails /, i.e. T \= f. The least 
Herbrand model LH(T) captures the semantics of the program T. For the above 
program height, we have that BB(height) = LH(height), which consists of all the 
ground atoms occurring in height. 

3. Merging Bayesian Networks With Definite Clause Logic 

Here, we show how Bayesian networks and definite clause logic are integrated 
within the framework of Bayesian logic programs. Before defining Bayesian logic 
programs formally, we present the key ideas and illustrate them on the height 
example. 

3.1. "Propositioned" Bayesian Logic Programs 

Besides the papers [Poo93,NH97,Jae97,Kol99] the book of Pat Lang- 
ley [Lan95] gives us a hint of how to combine Bayesian networks and definite 
clause logic. Langley does not represent Bayesian networks graphically but rather 
uses the notation of propositional definite clause programs. In the spirit of Lan- 
gley, "propositional" Bayesian logic programs will represent the Bayesian net- 
work of Figure 3 using the clauses of the definite clause program height in Fig- 
ure 2.3: each family (X, Pa(X)) of the Bayesian network corresponds to a clause 
X <— Pa(X) in Langley's notation. As a consequence, each random variable of 
the Bayesian network corresponds to a predicate (with arity 0, i.e. a proposition) 
in the definite clause program. Furthermore, for each random variable there will 
be exactly one clause whose head contains the random variable. On our example, 
one can now easily verify that the graphical structure of the Bayesian network 
corresponds to the dependency graph of the definite clause program. The depen- 
dency graph DG(T) of a definite clause program T is the directed graph whose 
nodes correspond to the ground atoms in LH(T) 4 . The graph has an edge from 
a node A to a node B if and only if there exists a clause C G T and a substi- 
tution 9, such that B = head(C)6, A G body(C)6 and for all atoms A' G CO : 
A' G LH(T). The dependency graph D(height) for our example program height 
is given in Figure 3 which shows the Bayesian network of the height example. 
Thus, the dependency graph captures the qualitative component of a Bayesian 
network. To capture the quantitative component, we have to associate to each 
clause X <— Pa(X) the conditional probability densities cpd(X) of the corre- 
sponding random variable X. Similarly, we will carry over the domains D{X) of 
the random variables X to the predicates of the Bayesian logic program. This 
actually implies that the predicates and clauses in the Bayesian logic program can 

4 The usual definition (see e.g. [AHV95]) considers the ground atoms in the Herbrand base. We 
use the least Herbrand model in order to put the idea of Bayesian logic programs across. 
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be interpreted in two different manners. First, there is the logical interpretation, 
which interpretes the clauses and predicates as (propositional) definite clauses 
and logical predicates. Secondly, there is the Bayesian interpretation, which con- 
siders the (propositional) definite clause program as a Bayesian network and the 
(ground) atoms as the random variables. 

The key idea underlying Bayesian logic program is now that we generalize 
Langley's notation towards first order logic by considering first order definite 
clauses instead of propositional ones, i.e. to allow for logical variables and to 
interpret the dependency graph of a definite clause program as the graphical 
structure of a Bayesian network. 

3.2. Bayesian Logic Programs 

Let us now define the key concept of Bayesian logic programs. 

Definition 3.1 (Bayesian (definite) clause). A Bayesian (definite) clause c is 
an expression of the form 

A | Ai, ... ,A n (3.1) 

where n > 0, the A, A\, . . . , A n are Bayesian atoms and all atoms are (implicitly) 
universally quantified. 

So, the differences between a Bayesian and a definite clause are : 

• the atoms r(t\,...,t n ) and predicates are Bayesian, which means that they 
have an associated domain D{r), and 

• we use " | " instead of to capture the idea of conditional probability 
densities. 

Note that the domain D(r) is unique for the predicate r. Furthermore, all other 
logical notions carry over to Bayesian logic programs. So we will speak of Bayesian 
predicates, terms, constants, functors, substitutions, ground Bayesian clauses, 
etc. Throughout the paper, we will use Prolog notation to write down clauses. 
Variables start with a capital; constant, functor and predicate symbols start with 
a lowercase. 

Example 3.2. The Bayesian clause c 

height(X) | mother(Y,X),father(Z,X),height(Y),height(Z) (3.2) 

defines the height of an individual X in terms of the heights of its mother Y and 
its father Z. The Bayesian predicates of c are height, mother and father. The 
domains of these predicates are D{father) = D(mother) = {true, false} and 
D(height) = R. □ 

Intuitively, a Bayesian predicate generically represents a set of random variables, 
and each Bayesian ground atom uniquely represents a random variable. To each 
Bayesian clause c we associate the conditional probability densities cpd(c). 
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mother(Y, X) = ui father(Z, X) = ui 


Cpd(c)(h | Ml, 1i2, U3, M4) 


true true 
true false 
false true 
false false 


Af(h, |(w 3 +w 4 ),60) 
Af(h, u 3 , 60) 
A/"(/i,u 4 ,60) 
A/"(/i, 175,60) 



Table 1 

The conditional probability densities cpd(c) associated to the Bayesian clause c of example 3.2: 
height (X) I mother (Y, X), father (Z, X), height (Y), height (Z). The parameters 113, u± refer to the 

heights of of individual X's parents. 

Definition 3.3 (associated conditional probability densities). Let c be the 
Bayesian clause 

r(ti, . . . ,t n ) I Si(t M , . . . ,ti 

,ni)> ■ ■ ■ ) s m("tm,l) ■ ■ ■ , "t mi n m ). 

The conditional probability densities cpd(c) specify for each (ui,...,u m ) G 
.D(si) x ... x D(s m ) a function cpd(c)(u \ u±, . . . ,u m ) : D(r) ^ [0, 1] with 

cpd(c)(u I m, . . .,u m ) = 

P{r(h,...,t n ) =U I Sl(tl,l, . . . ,il,ni) = Ul, ■ ■ ■ ,Sm(tm,l, ■ ■ ■ ,tm,n m ) =U m ). 

Because all Bayesian ground atoms r9 over a Bayesian predicate r inherit their 
domains from r, i.e. D(rO) := D (r), the densities cpd(c) generically represent the 
conditional probability densities of all ground instances cO. Thus, cpd(c) specifies 
the quantitative component of c. 

Example 3.4 (continuing example 3.2). The conditional probability densities 
cpd(c) associated to the Bayesian clause c of the last example are given in Table 1. 
Let 6 = {X 1 ^ john, Y ^ irene, Z henry} be a substitution. The ground 
instance cO specifies the conditional probability densities: 

p(height(john) \ mother (irene, john), height(irene), 
father(henry, john) , height(henry)) . 

□ 

At this point the reader may see some further connections to Bayesian 
networks. Indeed, reconsider the last example. The random variables 
mother(irene, john) , father(henry, john) , height(irene) , height(henry) directly in- 
fluence height(john). More generally any clause c specifies a direct probabilistic 
influence of each ground atom in body(c)9 on head(c)9 for any ground instance 
c9, when all ground atoms in c9 are true. 

So far, we have however ignored one important complication. When repre- 
senting a Bayesian network as a set of propositional clauses, there will be exactly 
one clause that defines each Bayesian predicate (i.e. the clause containing the 
predicate in the head). In Bayesian logic programs, one may have two clauses c\ 
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and C2 and corresponding substitutions 9i that ground the clauses q such that 
head(c\Q\) = /iea<i(c2#2)- This can lead to problems as illustrated in the following 
example. 

Example 3.5. Consider the Bayesian clauses 

height(X) | mother(Y, X), height (Y). 
and height(X) | f ather(Y, X), height (Y). 

in the light of the substitutions 9\ = {X <— jef, Y <— mary} and 62 = {X <— 
jef,Y <— john}. The ground clauses c\6\ and C2B2 specify 

p(height(jef) \ mother (mary, jef, ), height(mary)) 
and p(height(jef) \ father(john,jef ), height(john)) 

but they do not specify the needed conditional probability densities: 

p(height(jef) \ mother (mary, jef), height(mary) , 
father(john, jef) , height(john) ) 

□ 

Notice at this point that the two clauses c\,ci may be identical! The standard 
solution to obtain the distribution required are so called combining rules 5 . Our 
notion follows closely [NH97] and differs mainly in the restriction that the input 
set is finite. We make this assumption for computational reasons and will only 
weaken this restriction when embedding pure Prolog programs (cf. section 7.2). 

Definition 3.6 (Combining rule). A combining rule is any algorithm that maps 
every finite set of conditional probability densities 

{p(A I An,..., A ini ) \ l<i<m, m > 0}, (3.3) 

m > 1, onto the conditional probability densities 

p(A\B l ,...,B n ), (3.4) 

called the combined conditional probability densities, with {Bi,...,B n } = 
U™ 1 {ylii, . . . ,Ai ni } and n < 00. Its outputs are empty if and only if its inputs 
are empty. 

We have claimed the equality {B±, . . . , B n } = U^=i{^ii> • • • > ^mj f° r the sake of 
simplicity 6 . It will make the specification of probabilistic influences among ran- 

5 Similar concepts are used in other proposals, e.g. combinations functions [Jae97] or aggregate 
functions [Kol99]. 

6 This is no restriction. Assume B = {Bi,...,B n } to be a proper subset of A = 
\J^Li{An, . . . , Ai ni }. Then, we replace the original combining rule with a combining rule that 
outputs the combined densities f(A | A) where Vc € D(A\B) : f(A | B, A\B = c) = p(A | B). 
This is always possible. 
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dom variables, definition 4.4, and the formulation of an independency assumption, 
assumption 4.6, straight forward. In any case, combining rules can be seen as a 
generalization of the idea of canonical distributions (cf. [RN95, page 443]), that 
is, the relationship between the parents and a child fits some standard pattern, 
and of local probabilistic models, i.e. finer-grained structure within the associated 
conditional probability densities, see e.g. [BFGK96]. 

Example 3.7. The functional formulation of the combining rule max, a rule 
which will be useful for embedding pure Prolog program into Bayesian logic 
programs, is 

max{p(^ | A a , A irii ) \ i = 1, . . . , n} = 

p(A | Uf =1 {A il ,...,A irii }) :=max{p(A | A a , . . . , A ini )}. ^ 

i=i 

Another combining rule for Bayesian predicates having a boolean domain, e.g. 
{true, false}, is noisy _or. It is widely used in the Uncertainty in AI community. 
The rule noisy _or generalizes (see [RN95]) the logical or under three assumptions: 
(1) each cause has an independent chance of causing the effect, (2) all possible 
causes are listed and (3) whatever inhibits a parent pa x from causing the child is 
independent of whatever inhibits another parent pa 2 , pa 1 / pa 2 , from causing the 
child. Formally, the input {p(^4 | Ai) | 1 < i < m} over boolean random variables 
A, Ai, . . . , A m is mapped onto p(A \ A±, . . . , A m ) with 

m 

p(A = false \ Ai=a 1 ,...,A m = a m ) = - p(A = a \ A k = true)) a ' 

k=l 

p(A = true \Ai = ai,...,A m = a rn ) = 1 - p(A = false \ A 1 = ai, . . . , A m = a m ) 



for ai G D(Ai), i = 1, . . . , m, and where 

<*)<* = 



x : a = true, 
1 : a = false. 



As a last illustration, we consider a real random variable X in a CG network. A 
rule, which computes for each joint state of the discrete parents of A a weighted 
sum of the states of the continuous parents of X and sets this as mean of the 
Gaussian density of A, models a kind of regression. □ 

By now we are able to formally define the notion of a Bayesian logic program. 

Definition 3.8 (Bayesian logic program). A Bayesian logic programs B consists 
of a finite set of Bayesian clauses. To each Bayesian clause c there is exactly one 
cpd(c) associated, and for each Bayesian predicate r there is exactly one associated 
combining rule comb(r). 
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Definition 3.9 (corresponding logic program). Let B be a Bayesian logic pro- 
gram. The set of logical definite clauses corresponding to the set of Bayesian 
clauses of B is called the corresponding logic program B. 

Attention should be paid to the fact that the definition allows for functor sym- 
bols to be used. Herein Bayesian logic programs differ from probabilistic relational 
models [Kol99] and relational Bayesian networks [Jae97] which restrict themselves 
to functor- free languages. This corresponds to using pure Prolog instead of dat- 
alog. 

Example 3.10. The following Bayesian logic program height models our ge- 
netic domain (Section 4 will prove this): 

f ather (unknownl ,f red) . mother(ann, fred) . father (brian,dorothy) . 
mother (aim, dorothy) . father (brian, eric) . mother(cecily,eric) . 
f ather (unknown2 , gwenn) . mother (arm , gwenn) . father(fred, henry) . 
mother (dorothy , henry) . father (eric , irene) . mother (gwenn, irene) . 
father (henry , john) . mother ( irene, john) . 

height (ann). height (brian) . height (cecily) . 

height (unknownl) . height (unknown2) . 

height (X) | mother (Y,X), f ather (Z,X), height (Y), height (Z). 

We associate to each Bayesian predicate the identity as combining rule and 
to each Bayesian ground fact over mother or father conditional probability 
densities of the form: 



P{mother(X,Y)) P(father(X,Y)) 
true false and true false 



1.0 0.0 1.0 0.0 

The associated densities of each Bayesian ground fact over height resembles 
equations 2.4 and 2.5, and the associated densities of the remaining clause are 
given in example 3.4. □ 

To summarize, we have introduced Bayesian logic programs. They combine 
definite clause logic with Bayesian networks by establishing a one-to-one mapping 
between ground atoms and random variables. Thus, a logical variable remains a 
logical variable. This separates Bayesian logic programs from existing knowledge- 
based model construction approaches such as probabilistic logic programs [NH97], 
relational Bayesian networks [Jae97] and probabilistic relational models [Kol99]. 
In the latter frameworks, ground atoms represent states of random variables. 
Bayesian clauses generically specify possible direct influences. The associated 
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conditional probability densities together with the associated combining rules 
probabilistically quantify these influences. Thus, Bayesian logic programs nicely 
separate the quantitative and the qualitative components. 

4. Declarative Semantics 

Intuitively, each Bayesian logic program B specifies a (possibly infinite) 
Bayesian network, i.e. a joint probability density over a countable set of random 
variables. This view implicitly assumes that all knowledge about the domain of 
discourse is encoded in the Bayesian logic program (e.g. the horses belonging 
to a farm or to a pedigree). If the domain of discourse changes (e.g. the horses 
under consideration), then part of the Bayesian logic program has to be changed. 
Usually, these modifications will only concern Bayesian ground atoms (e.g. the 
Bayesian ground atoms over "mother" , "father" ) . This is akin to the extensional 
facts of a database. The clauses then correspond to intensional rules. 

The semantics of definite clause programs has been well-studied (see e.g. 
[Llo89]). The main result is that the intended meaning of a definite clause pro- 
gram, such as B, is represented by its least Herbrand model hH(B) C HB(i?) 
which contains all ground atoms of the Herbrand base HB(£?) that are logically 
entailed by the program. If we ignore termination issues, they can - in principle - 
be computed by a theorem prover, such as e.g. Prolog. Various methods exist to 
compute the least Herbrand model. We merely sketch its computation through 
the use of the well-known immediate consequence operator T~ (cf. [Llo89]). For 
simplicity, we will assume that all clauses in a Bayesian logic program are range- 
restricted. A clause is range-restricted iff all variables occurring in the head also 
occur in the body. Range restriction is often imposed in computational logic. It 
allows to avoid derivation of non-ground true facts, i.e all facts entailed by the 
program are ground. 

Definition 4.1 (immediate consequence operator). Let B be a Bayesian logic 
program and Z a Herbrand interpretation over B. The immediate consequence 
operator T~ defined by B is the function on the set of all Herbrand interpretations 

of B such that for any such interpretation X we have 

T~{I) = {A9 \ there is a substitution 9 and a 

clause A \ A±, . . . , A n in B such that 
AO | A±9, . . . , A n 9 is ground and 
for all i £ {1, . . . ,n}: A { 9 G J}. 

Definition 4.2 (least Herbrand model). Let B be a Bayesian logic program. The 
least Herbrand model LYL(B) is defined as the least fixpoint of T~ applied on 0, 
i.e. T~(LR(B)) = LH(B) = T~{T~{. . . T~(0) . . .). 



16 



Example 4.3. The least Herbrand model of the corresponding logic program 
B of the Bayesian logic program in example 3.10 coincides with T 2 because 

X\ : = T~(0) =the set of all ground facts in B 
I2 '■= T~{I\) = X\ U { height(fred) , height(dorothy), 
height(eric) , height(gwenn) , 
height(herny) , height(irene) , 
height(john)} 

ly.= T~(l 2 ) = l 2 

□ 

Now, due to the one-to-one mapping between logical ground atoms, Bayesian 
ground atoms and random variables there exists exactly one set of random vari- 
ables corresponding to BB(B) as well as exactly one set corresponding to LR(B). 
We define the former set as the Herbrand base HB(i?) and the latter set as the 
least Herbrand model LH(i?) of the Bayesian logic program B. As for definite 
clause programs HB(B) constitutes all random variables we can talk about given 
B, and LR(B) specifies the proper random variables; these are the ones for which 
(conditional) probability densities are (well-) defined. Much that, all other logi- 
cal notions carry over to Bayesian logic programs. So, we will speak of Bayesian 
predicates, terms, constants, substitutions, ground Bayesian clauses, dependency 
graph etc. 

So far, we have only characterized the proper random variables of discourse, 
i.e. the nodes in the Bayesian network. What is left is to introduce local influences 
among them, i.e. the edges in the Bayesian network. A natural candidate as 
medium for this is the already used immediate consequence operator. 

Definition 4.4 (direct influence 7 ). Let B be a Bayesian logic program. A ran- 
dom variable C directly influences a random variable A if and only if 

1. A,C £ LR(B) and 

2. there is a Bayesian clause A' \ Ai,...,A n in B and a substitution 8 that 
grounds the clause such that A = A'6 and C = A$ for some i, 1 < i < n, 
and Aj6 G LB(B) for all 1 < j < n. 

The set of random variables directly influencing a random variable A £ LH(i?) is 
denoted as P&(A), the parents of A. The recursive closure of the direct influence 
relation over LH(1?) defines the influence relation. 

Roughly speaking, a random variable C influences a random variable A whenever 
there is a proof of A that relies on C. 

Example 4.5 (continuing our running example). Figure 5 shows the direct in- 
fluence relation of the height example. The filled nodes correspond to the nodes 
of the Bayesian network in Figure 3. □ 
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Figure 5. The direct influence relation induced over the least Herbrand model of the Bayesian 
logic program height of example 3.10. The filled nodes correspond to the nodes of the Bayesian 
network of Figure 3 modelling the stud farm example. 

An induction over the cardinality of LH(i?) shows that the dependency graph 
DG(B) graphically represents the influence by relation. Using the influenced by 
relation we are able to state a conditional independency assumption similar to 
that of Bayesian networks, cp. assumption 2.1: 

Assumption 4.6 (independency 7 ). Each node A G LH(f?) in the dependency 
graph DG(B) is conditionally independent of any subset A C LH(1?) of nodes 
that are not descendants of A given a joint state of Pa(-A), i.e. p(A | A, Pa(A)) = 
p(A | Pa(A)). 

Example 4.7. The direct influence relation of our running example shown in 
Figure 5 encodes e.g. the independency 

p(height(john) \mother(irene, john) , height(irene) , 

father '(henry , john) , height(henry) , C) 
= p(height(john) \mother(irene,john),height(irene), 
father(henry, john) , height(henry)) , 

7 Without the claimed equality in the definition 3.6 of combining rules one would have to take 
the combined conditional probability densities into account. The influenced by relation would 
become a subset of currently employed one. 
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where C is any set of variables influencing height(john). □ 

So far, we have ignored one important requirement: the influenced by relation 
should be acyclic in order to obtain a well-defined Bayesian network. I.e. the 
dependency graph DG(B) should be acyclic (in the usual graph theoretical sense). 
Now, the network or graph can only be cyclic if there exists a proper random 
variable that influences itself. However, if there exists such a proper random 
variable A, executing the query ?- A. (using Prolog) would also be problematic. 
The SLD tree (cf. see [Llo89] and Section 6) of the query would be infinite and 
the query may not terminate 8 . Thus in such cases the logical component of the 
Bayesian logic program is itself problematic. With this in mind, we can formulate 
Theorem 4.9. The idea is to interpret the dependency graph of a Bayesian logic 
program B as the graphical structure of a (possibly infinite) Bayesian network. 

Definition 4.8 (well-defined Bayesian logic program). Let B be a Bayesian logic 
program. If 

1. LH(B) + {}, 

2. the dependency graph DG(B) is acyclic, and 

3. each random variable in LH(i?) is only influenced by a finite set of random 
variables, 

then B is called well-defined. 

The height Bayesian logic program in example 3.10 is well-defined. 

Theorem 4.9 (declarative semantics). Every well-defined Bayesian logic pro- 
gram B specifies a unique probability measure over LH(B). 

Proof. This proof can be skipped without loss of continuity. Background mate- 
rial and definitions from probability theory are introduced in Section A in the 
appendix. 

Let B be a well-defined Bayesian logic program. We can assume that all 
proper random variables are real random variables, i.e. \/A G LH(i?) : D(A) = R. 
The existence and the uniqueness of LH(i?) is guaranteed [Llo89], and LH(.B) is 
countable [Doe94]. We will show, that B specifies a projective family of probability 
measures (cf. Appendix A): A family (M 7 , 1Z 1 , Pi)i^h{t) (where T is a non-empty, 
countable set) of probability spaces is called projective if for all H,J<E H(T) with 
H C J the equation Pj = projj (Pjj) holds, where Ti(T) is the set of all non- 
empty, finite subsets of T, and profj is the projection of M. H onto R* 7 . Then, the 
existence of a unique probability measure over LH(I?) follows from Kolmogorov's 
theorem. 

Because B is well-defined, we can assign to each random variable A G LH(i?) 
a finite rank: 

8 Termination depends on the search strategy of the theorem prover. 
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1. rank(A) = 0, if no random variable in LH(U) is influencing A. 

2. rank(yl) = max{rank(D) | D G LH(f?) A D is directly influencing A} + 1, 
otherwise. 

Indeed, rank induces a total order it over LH(B). So, let {A n ) n ^T (for some index 
set T) be the sequence of random variables in LH(f?) in ascending order according 
to 7r. We show now that B specifies for each J G 7i(T) a unique probability space. 

The set of random variables constituting the space desired are A (J) = 
{Aj lf . . . ,Aj m }, and, hence, the measurable space is (R J ,B J ). Furthermore, B 
defines for each A G LH(.B) exactly one associated (combined) conditional prob- 
ability density cpd(A \ Pa(A)). These densities could be seen as constraints on 
the probability measure Pj over (M. J ,B J ). For that purpose, take the completion 
C(J) of A(J) with respect to the influenced by relation over LR(B) into account: 

C(J) := {D G LH(B) | D influences some A G A(J)} (4.1) 

The set C(J) is always finite because B is well-defined. An induction over |C(J)| 
shows that C(J) together with the densities cpd(D | Pa(D)), D G C(J) uniquely 
induce a Bayesian network N(J): the nodes of N(J) are C(J) and its edges are 
given by the influence relation. The network N( J) specifies the unique probability 
densities Pc(J) (and therefore a unique probability measure -Pc(J)) over ffi K > B K ), 
K £ H(T), as follows: 

p C (J)(C(J))= II cpd(A | Pa(A)). (4.2) 
AeA(J) 

Now, the probability densities p j are marginalized densities of Pc(j) , i-e- 

/+oo r+oo 
•••/ Pc(J)(C(J))dD (4.3) 
-oo J — oo 

where D = C(J)\A(J). Let Pj the probability measure uniquely specified by 
pj. Then (M. J ,B J ,Pj) is the unique probability space desired. 

The densities Pc<3) are the key to prove that the family (M. 1 ,B ! , Pi)i£U(T) 
of probability spaces is projective. Let H = {h±, . . . , h n } G Ti(T) with J C H 
and A(H) = (A^, ... , Ah n ). Performing the same steps for A(H) as for A (J) we 
obtain a Bayesian network N(H). An induction over \N(J)\ proves, that N(J) is 
a subnetwork of N(H). Thus 

/+oo r+oo 
... p C (H)(C(H))dD' (4.4) 
-oo J —oo 

where D' = C(H)\C(J). Because (4.2) and (4.3) applies to both A(J) and A(H) 
it follows that pj are marginalized densities of pn, the densities over 
This together with the fact, that N(J) is a subnetwork of N(H), means that 
the family (R 1 jB 1 , Pi)ieH(T) °f probability spaces is projective. It then follows 
from Kolmogorov's theorem that there exists a unique probability measure Pt 
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over (R T ,B T ) which satisfies all constraints given by the associated (combined) 
conditional probability densities. This proves the theorem because the total order 
7r could be any rank respecting total order. □ 

Not every Bayesian logic program is well-defined. Let us investigate some exam- 
ples of ill-defined programs. 

Example 4.10. The Bayesian logic program 

r(X) | s(X). 

is not well-defined because its least Herbrand model is empty. The program 

r(a). 
s(a,b) . 
r(X) | r(X). 
r(X) | s(X,f(Y)). 
s(X,f(Y)) | s(X,Y). 

is ill-defined as well because the random variable r(a) is directly influenced by 
s(a, f(b)), s(a, f(f((b))), ■ ■ ■ and by itself. The Bayesian logic program 

r(a). 
s(a) . 

r(X) | r(f(X)). 
r(f(X)) | s(f(X)). 
s(f(X)) | s(X). 

is ill-defined, because the random variable r(a) is influenced by 
r(f(a)),r(f(f((a))),... though r(a) has a finite proof. The direct influence 
by relations of the two latter programs are shown in Figure 6. □ 

For the rest of the paper, we will consider a Bayesian logic programs to be well- 
defined if nothing contrary is stated explicitly. 

To summarize, as a consequence of the one-to-one mapping of ground atoms 
onto random variables, we can switch between logical concepts like ground atoms, 
direct consequence operator and concepts of probability theory like random vari- 
able and direct influence. The Herbrand base of a Bayesian logic programs con- 
stitutes the set of random variables and its least Herbrand model specifies the 
proper random variables of discourse. The direct consequence operator (and the 
combining rules) induces the direct influence relation over the proper variables. 
Thus, the dependency graph augmented with the combined conditional proba- 
bility densities can in principle be interpreted as a (possibly infinite) Bayesian 
network. This holds under the reasonable assumption of well-definition, i.e. no 
random variable influences itself or is influenced by infinite many other random 
variable. We will sometimes refer to the augmented dependency graph as the 
Bayesian network of the given Bayesian logic program. 
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(a) (b) 

Figure 6. Part of the directly influence relation of (a) the first and (b) the second ill-defined 

Bayesian logic programs of example 4.10. 

5. Query- Answering Procedure 

In this section, we show how to answer probabilistic queries to a Bayesian 
logic program. A probabilistic query to a Bayesian logic programs is with respect 
to Prolog defined as follows: 

Definition 5.1 (probabilistic query). A probabilistic query to a Bayesian logic 
program B is an expression of the form ?- Q l5 . . . , Q n | Ei = ei, . . . , E m = e m 
with n > 0, m > 0. It asks for the conditional density p(Qi,...,Q n 
E\ = ei, . . . , E rn = e m ) of the query variables Q±, . . . , Q n where 
{Qi, ■ ■ ■ , Q n , E±, ... , E m } C HB(f?). A query with m = is called evidence- free. 

The definition generalizes the inference problem for Bayesian networks, see 
definition 2.2. Due to Theorem 4.9, we say that an answer is defined if and 
only if {Qi, . . . ,Q n , Ei, . . . , E m } C LH(B). To answer a query we adapt the 
two step strategy of knowledge-based model construction approaches: (see 
e.g. [BGW94,Had99]): first, we construct a Bayesian network N and, second, 
we apply a Bayesian network inference algorithm on N in order to answer the 
query. We first assume that the answer is defined and will come back to the 
more general case in Section 5.3. 

A naive approach would be to explicitly build the Bayesian network repre- 
senting the direct influenced by relation over the proper random variables. Because 
the resulting Bayesian network may be infinity, this is impossible. The way out 
of the problem is to use support networks. The notion of support network is due 
to Ngo and Haddawy [NH97] but we will adapt it for our purposes. 

Definition 5.2 (support network). Let B be a Bayesian logic program, N its 
(possibly infinite) Bayesian network and Xi, . . . ,X rn , m > 0, nodes of N. The 
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Figure 7. The support network N (height(fred)) with respect to the height Bayesian logic program. 



support network N(Xi, . . . , X m ) of Xi, . . . , X rn is the subnetwork of N which 
consists of the nodes 

{Y € LH(f?) | Y influences some Xi, 1 < i < m} (5-1) 

and the edges of N which connect only nodes in N(Xi, . . . ,X m ). 

The support network N (height(fred)) with respect to our running example is 
shown in Figure 7. We will now prove that the support network is sufficient to 
compute any conditional probability density involving only random variables of 
the support network. 

Theorem 5.3. Let N be a possibly infinite Bayesian network, Qi, . . . , Q n , n > 
0, nodes of N and E = e, E C N. The computation ofp(Q±, . . . , Q n | E = e) does 
not depend on any node X of N which is not a member of the support network 
N({Q 1 ,...,Q n }UB). 



Proof. This proof can be skipped without loss of continuity. Background mate- 
rial and definitions from probability theory are introduced in Section A in the 
appendix. 

In order to prove the theorem we only have to show that N({Q\, . . . , Q„}UE) 
is sufficient to compute p(X±, . . . , X[) for any set {X±, . . . , X{\, I > 0, of random 
variables in N({Q±, . . . , Q n } U E). The theorem follows then from the definition 
of conditional probability density: 

p(g 1 ,...,Q n |E = e) = P^---'^ E = e ) 



p(E = e) 

We proceed in a similar way to the proof of theorem 4.9. Let ir be a total order 
of the nodes in N. Let T be a (non-empty) index set, TL{T) be the set of all 
non-empty, finite subsets of T, and {A n ) n ^T be the sequence of random variables 
in N in ascending order to it. Analogously to the proof of theorem 4.9 we can 
prove by induction that TV specifies a projective family of probability measure. 
Now, the set {Qi, . . . , Q n } UE corresponds to A(H) for some H 6 H(T), and the 
set {Xi, . . . ,Xi} corresponds to A(L) for some L G TL(T). In order to compute 
p(Xi, ... ,Xi) we consider the completion 

C(L) = {D eN \ D influences some X { G A(L)} 
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of A(L) (resp. C(H) of A(H)). The set C(H) equals per definitionem the set 
of nodes of N({Qi, . . . , Q n } U E). Therefore, we have A(L) C C(H) and, hence, 
C(L) C C(H). As in the proof of theorem 4.9, the probability densities over C(H) 
(resp. C(L)) are specified by a unique Bayesian network N(H) (resp. N(L)). It 
consists of all random variables in C(H) (resp. C(L)) and of all edges between 
nodes in N which are random variables in C(H) (resp. C(L)). Since N specifies 
a projective family of probability measures, N(L) is a subnetwork of N(H). That 
means the computation of p(X\, . . . ,Xi) only depends on nodes and edges in 
N(H). But N(H) is per definition the support network N({Q 1 , Q n } U E). 
The theorem is proven. □ 

Thus, we can answer the probabilistic query ?- Qi, . . . , Q n | Ei = ei, . . . , E m = e m 
using the support network N({Q\, . . . , Q n } U {E\, . . . , E m }). The next two sec- 
tions deal with computing support networks. They rely on the following property 
of support networks: 

Proposition 5.4. Let B be a Bayesian logic program, N its (possibly infinite) 
Bayesian network and X±, . . . ,X m , m > 0, nodes of N . The support network 
N(Xi, . . . , X m ) is the graph union G of all single support networks N(Xi). 

Proof. First, we show that N(X±, . . . ,X m ) and G have the same set of nodes. 
The support network N(X±, . . . , X m ) has per definitionem a node A if and only if 
A is influencing a Xi G {X±, . . . , X m }. But, A is influencing a X{ G {X±, . . . , X m } 
if and only if A is a node in N(Xi), i.e. A is a node in G. 

Now, we prove that N(X±, . . . ,X m ) and G have the same set of edges. A 
support network N(X±, . . . , X m ) has an edge E from a node A^ to a node Aj if 
and only if (1) both nodes, Ai and Aj are influencing a X^ G {Xi, . . . , X m } and 
are therefore in N(Xk), and (2) the edge E is in N. Per definitionem of a support 
network this is if an only if the edge E is in N(Xk), i.e. E is an edge in G. □ 

5.1. Evidence-free Probabilistic Queries 

We first restrict ourselves to evidence-free queries of the form ?- Qi. 

Example 5.5. Given the Bayesian logic program of example 3.10, the proba- 
bilistic query ?- height(f red) asks for the densities p(height(fred)) . The infor- 
mation about john stated in B is irrelevant, because it does not appear in the 
support network N (height(fred)) shown in Figure 7. □ 

But how do we build the support network? 

5.1.1. AND /OR Trees 

The usual execution model of logic programs relies on the notion of SLD 
trees (see e.g. [Llo89,SS86]). For our purposes AND/OR trees [Nil71,KK71,VM75, 
Nil86] are more suitable because they allow us, as we will see, to combine the 
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solution tree 




Figure 8. The AND/OR tree T(height(fred)) according to example 5.8. Unfilled ovals represent or 
nodes, whereas and nodes are represented by filled ovals. The dotted box indicates the solution 
graph S ' (height(fred)) , i.e. all unsolved nodes together with their in- and outgoing edges; in 

particular all infinite paths are removed. 

probabilistic with the logical computations. AND/OR trees have a long history 
in the AI community and we adapt them here for our purposes. 

Definition 5.6 (AND/OR tree). Let B be a Bayesian logic program and ?- Q an 
evidence-free probabilistic query. The AND/OR tree T(Q) of the query given B is 
a tree whose nodes are divided into two disjunctive sets, the set of and nodes and 
the set of or nodes. Each node contains a conjunction of ground atoms. The nodes 
?- Ai, . . . , ?-A n constitute all children of an and node ?- Ai, . . . , A n . An or node 
?- A has a child ?- (Ai, . . . , A n )9 if a Bayesian definite clause A' \ A\, . . . , A n in 
B and a substitution 9 exist, such that 9 grounds the clause and A'9 = A. The 
only node ?- Q having no predecessors is called the root node and it is always an 
or node. 

In describing AND/OR trees we shall continue to use terms like parent nodes, 
successor nodes, paths etc. with the obvious meaning. While constructing an 
AND/OR tree we interpret a ground fact A as a ground clause A | true where 
the symbol true is a built-in with the usual meaning. Our objective for using an 
AND/OR tree T(Q) is to show that Q is solved, i.e. Q G LH(B). 

Definition 5.7 (solved). An and node containing ?- true is solved. Any other 
and node is solved if it has at least one child and all of its children are solved. An 
or node is solved if at least one of its children is solved. 

Example 5.8. Consider again the Bayesian logic program of example 3.10. The 
AND/OR tree T (height(fred)) of Figure 8 states that height (f red) is solved 
because the and node 

?- mother(ann, f red) , f ather (unknown! ,fred) , height(ann), height (unknown!) 
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is solved. This node in turn is solved because ?- f ather (unknownl ,f red) , 
?- mother (ann, fred), ?- height (ann) and ?- height (unknownl) are all 

solved. □ 

Definition 5.9 (solution tree [NU71]). Let T(Q) be an AND/OR tree. The so- 
lution tree S(Q) of T(Q) is the maximally finite subtree of solved nodes of T(Q) . 

A solution tree partly represents the immediate consequence operator and hence 
a subset of the directly influenced by relation over LR(B). This is due to the 
following property: 

Property 5.10. There exists an edge from a solved or node labeled A to a solved 
and node labeled C if and only if there exists a ground instance A \ C of a clause 
in B such that {A} UCc LH(£). 

All or nodes of a solution tree correspond to proper random variables; we call the 
set of random variables corresponding to the nodes relevant with respect to the 
given Bayesian logic program and probabilistic query. A solution tree S(Q) does 
not only encode Q G LH(.B) and its set of relevant random variables, it encodes 
also all ways of proving that its or nodes are solved. It follows that the solution 
tree is unique. This makes it possible to represent the solution tree in a more 
compact and more suitable way: its collapsed version. All nodes containing the 
same query are merged. We call the resulting graph the solution graph S(Q). The 
solution graph is a more suitable representation because there exists a one-to-one 
mapping between the or nodes of S(Q) and the relevant random variables, so 
that the following properties hold 

Property 5.11. There exists an edge e in S(Q) going from an or node O to 
an and node A if and only if there exists a clause C G B, such that O \ A is a 
ground instance of C and {0} U A C LR(B). 

Property 5.12. Let O be an or node O in S(Q) and X G LH(B) its corre- 
sponding random variable. Furthermore, let {X±, . . . , X n } C LH(f?) be the ran- 
dom variables corresponding to the grandchildren of O in S(Q). Then Pa(A) = 
{Xi, . . . ,X n }. 

Hence, we can augment the edge e with the conditional probability densities 
cpd(C), denoted as cpd(e). 

Example 5.13. Figure 9 shows the solution graph of the probabilistic query 
?- height (fred) augmented with conditional probability densities. □ 

Now, we have everything to prove the main theorem of this subsection. 

Theorem 5.14. Let B be a well-defined Bayesian logic program and ?- Q an 
evidence-free probabilistic query with Q G LH(B). The support network N(Q) is 
the result of performing for each or node A ( over predicate r) in the solution 
graph S(Q) the following steps: 
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p(height(fred) \ mother(ann, fred) , father(unknownl , fred) , height(ann) , height(unknownl) 



C^Jieight(unknowrr 



p(height(unknownl)) p 




Figure 9. The (collapsed) solution graph of height (fred) augmented with probabilistic infor- 
mation: The conditional probability densities p(0 | A) are associated to each edge from an or 
node O to an and node A. The densities p(0 | A) equal cpd(c), the densities associated to the 
Bayesian clause corresponding to the edge. 

1. Compute the combined conditional probability densities of A: 



2. Associate cpd(A) to the node A. 

3. Remove all children of A in N and their in- and outgoing edges. 

4. Insert a directed edge from each formerly grandchild of A, on which cpd(A) 
is conditionalized, to A itself. 

The resulting augmented graph is N(Q). 

Proof. Let B be a well-defined Bayesian logic program and ?- Q a probabilistic 
query to B with Q G LH(y3). Remember that the solution graph S(Q) is given in 
its collapsed version and that it is augmented with the corresponding probabil- 
ity distributions. It is clear that the described transformation on S(Q) yields a 
Bayesian network N whose dependency structure coincides with a subset of the 
directly influenced by relation over LH( 73) and which encodes the joint probabil- 
ity density. But we have to prove that N is a support network of Q. It is trivial 
that N fullfils the conditions 1, 2 and 3 of definition 5.2. What remains to be 
proven is that N is of minimal size. Let us assume that N is not minimal, i.e. 
that a Bayesian network N' exists which fullfils the conditions 1, 2 and 3 but is 
of smaller size. Because of condition 3 we can assume that N has at least one 
node U which is not a node of N'. Due to condition 1, U is not the root node 
Q of N. So, let U be any other node except for the root node. We can follow a 
path from U through its children and must come to a node U' in N' (at least the 
root node, which is identical in N and TV 7 ). This node has Pa({7') <£_ N'. Thus 
U' violates condition 2. This contradicts our assumption that N' is a smaller 
Bayesian network fullfilling the conditions 1, 2 and 3. □ 



cpd(A) = comb{r){cpd{e) \ e is an outgoing edge of A}. 
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Figure 10. The support network of the query ?- height (f red) I height (eric) of exam- 
ple 5.15. 

5.2. Probabilistic Queries With Evidence 

So far, we have restrict ourselves to evidence-free queries of the form ?-Qi. 
The extension to the more general case concerns queries of the form 

?-Qi,...,Q n I Ei = ei, . . . ,E m = e m 

with n > 0, m > 0. Due to theorem 5.3 we have to build the support network 
N({Qi, . . . , Q n , E±, ... , Em}). Due to proposition 5.4 this is the graph union of 
all single support networks N(Qi), . . . , N(Q n ), N(Ei), . . . , N(E m ). Consequently, 
the procedures and claims about evidence-free queries can easily be adapted. 

5.3. Characteristics of the Query-Answering Procedure 

Here, we study the minimality of the support network as well as the sound- 
ness of the purposed inference procedure. 

Let us first point out that the support network is not minimal with respect 
to given instance of the inference problem. 

Example 5.15. Figure 10 shows the support network for 

?- height (f red) | height (eric) . 

It consists of two connected components, one for height (f red) and one for 
height (eric) , of which the latter one is redundant. □ 

This is related to a well-known problem in Bayesian networks: which nodes of 
a given Bayesian network are relevant to compute a desired density? Various 
answers exists in the literature, e.g. Geiger et. al. [GVP90] or Shachter's Bayes- 
Ball [Sha98], and all of them can be applied to the constructed support network. 
One refinement to our algorithm is due to Ngo and Haddawy [NH97] : if no (undi- 
rected) path between the query variable and an evidence variable exists, then the 
support network of that evidence variable can be deleted. 

If one of the random variables occurring in a query is not proper then the 
solution graph of that variable is empty and, hence, no density is specified. The 
answer to the query is undefined. Furthermore, if the Bayesian logic program 
is ill-defined, e.g. due to infinite branching factors or an infinite path then the 
procedure would not terminate, too. 
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Definition 5.16 (well-defined probabilistic query). A probabilistic query 
Qi, . . . , Q n | Ei = ei, . . . , E m = e m is well-defined, if the AND/OR graphs of all 
variables Qi, . . . , Q n , E±, . . . , E m are finite. 

The definition resembles the definition of well-defined Bayesian logic programs 
in compliance with computability. Remember that our query-answering proce- 
dure follows the knowledge-based model construction approach. It consists of two 
phases: (1) construct the support network of the given probabilistic query and 
(2) apply a Bayesian network inference algorithm to the support network. Thus, 
we have to guarantee that the construction of the support network terminates. 
The AND/OR tree of each random variables in a well-defined query is finite. The 
construction of the support network terminates in finite time. Thus, our proposed 
query- answering procedure is sound for well-defined probabilistic queries. 

6. Interpreting Bayesian Logic Programs 

Though the previous section describes a query-answering procedure, the 
question of how to compute the solution graphs themselves is still open. This 
section will show, that Bayesian logic programs can be interpreted using a com- 
mon meta interpreter written in Prolog. This in turn gives us an efficient and 
particularly practical approach for constructing support networks. 

6.1. The "missing" link between solution graphs and SLD trees 

To come up with such a meta interpreter we analyze the relation of solution 
graphs to SLD trees [Llo89,SS86]. The relation was already investigated in the 
early days of Artificial Intelligence by Kowalski and Kuehner [KK71, page 250]. 
But for reasons of self-containedness we will discuss it here. 

The SLD tree represents all possible series of application of the SLD reso- 
lution inference rule on a goal (cf. [Llo89]) graphically. A goal is an expression 
of the form ?- G\, . . . ,G n where all GVs are Bayesian atoms. For Bayesian logic 
programs, we have: 

Definition 6.1 (SLD tree). Let B be a Bayesian logic program and ?- G a a 
goal. An SLD tree for B U {G} fullfils the following conditions: 

1. Each node A of the tree is a (possibly empty) goal. The set of atoms of A is 
denoted by atoms(A). 

2. Let ?- Ai, . . . , A m , . . . ,Ak (k > 1) be a node of the tree where A m is the 
selected atom. Then, this node has for each clause A : — B\, . . . , B n in B, 
such that A m and A unify with the most general unifier (MGU) 9, a successor 
node 

?- (Ai, . . . , An_i, Bi,..., B n , A m+ i, . . . , Ak)6. 
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heightfgwenn} 



height(X) | mother(Y, X) , father(Y, X) 
height(Y), height(Z) 

{X <— gwenn} 



mother{Y,gwenn),height(Y), 
father(Z, gwenn), height(Z) 



mother(ann, gwenn) 
{Y <— ann} 



eight(ann),father(Z, gwenn) 
height(Z) 



height( ann) 
{} 



father(Z, gwenn) 
height(Z) 



father(unknownl, gwenn) 
{Z <— unknownl} 



<^height(unknown2) 



height(unknownl) 
{} 



Figure 11. The SLD tree of the goal ?- height (gwenn) . The edges are labeled with the applied 
MGUs and clauses. The corresponding solution graph is shown in Figure 9. 



3. Nodes representing the empty clause have no successor. 

Thus, a single edge corresponds to a single application of the resolution inference 
rule which could be seen as a kind of Modus Ponens in the case of definite clause 
programs [RN95]. In describing SLD trees we shall continue to use terms like 
parent nodes, successor nodes, paths etc. giving them the obvious meaning. A 
successful path is a path starting at the root and leading to a terminal node rep- 
resenting the empty clause. The (finite) SLD tree of the goal ?- height (gwenn) 
is shown in Figure 11. We label each edge e of an SLD tree with the MGU 6(e) 
and the clause C (e) used in the corresponding application of the resolution in- 
ference rule. Let S be an SLD tree of a (well-defined) Bayesian logic program 
B and ?- A±, . . . , A/, a goal G. Furthermore, let Pi, . . . , P n be the nodes along a 
successful path P of length n in S and ej the edge pointing from Pi to Pj+i. It is 
well-known (see e.g. [Lio89]) that for range-restricted definite clauses programs 
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(such as Bayesian logic programs) 

k 

{Aj0G,p) C LH(B), 
3=1 

where 0g,p is the substitution Qg,p = $( e i) • • • 0( e n-i)- The substitution #g*,p is 
called the answer substitution. Because every node Pi in an SLD tree is itself a goal 
the same is true for all nodes along P, i.e. \JAe«toms(Pi)(^Pi,p) ^ LH(B). Thus, 
each grounded node P{Qg,p consists of proper random variables. The ground 
clause C(ej)0G,p corresponds to an edge from the or node head(C(ej))0G,p to 
the and node body{C{ej))6G,p in a solution graph. According to the definition 
of AND/OR trees, all children of the and node body(C(ej)9G,p) are determined 
by the node itself. Therefore, if we require all Ai to be ground atoms then we 
can construct the union of the solution graphs of A\, . . . , from the set of all 
grounded successful paths. 



6.2. An implementation 

Having established the link between SLD trees and solution graphs of a 
Bayesian logic program it is easy to interpret Bayesian logic programs. E.g. one 
could adapt a backward chaining algorithm presented in [RN95, page 275] as done 
in the procedures SupportNetwork, SLD-Tree and ComputeSupportNet- 
WORK. 

The procedure SupportNetwork establishes the overall flow. After ini- 
tializing the support network N and the SLD tree Tree to be empty, it computes 
both of them calling first SLD-Tree and then ComputeSupportNetwork. At 
the end, one could prune the support network N, although we will not investigate 
this. 

The backward-chaining algorithm SLD-Tree works by first checking to see 
if any Bayesian ground fact unifies with the query Q. If so, corresponding edges 
are inserted into the SLD tree Tree. It then finds all Bayesian clauses whose 
head unifies with the query Q, and tries to prove the bodies of those Bayesian 
clauses, also by backward-chaining. SLD-Tree processes the body of a selected 
Bayesian clause atom by atom, building up the whole SLD tree Tree. Remember 
that each selected Bayesian clause together with the query and the MGU spec- 
ifies a particular set of edges in the SLD tree Tree. The clause, the MGU and 
the corresponding associated conditional probability densities are stored together 
with the edge in Tree. 

ComputeSupportNetwork inspects all successful paths P in Tree in turn 
and builds up the corresponding support network N. It works by inserting for a 
ground clause C(e)0 (gathered from the informations associated to an edge e in 
P) all corresponding nodes and edges in the support network N. After building 
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this "uncombined" version of the support network N, it traverses N and applies 
the corresponding combining rules on each node. This may delete some of the 
edges. 

A slightly modified and naive implementation of these procedures in 
Prolog can be found in the Appendix. It builds on a Prolog meta interpreter. 
Various types of such meta interpreters relying on the SLD tree exists (see 
e.g. [SS86,Bra86]). We adapted a simple one relying on depth-first search. 



Data : B, a well-defined Bayesian logic program; Q, a query variable. 
Result: N, the support network of ?- Q. 

N ^EmptyBayesianNetwork; 
Tree ^EmptySLDTree; 
SLD-Tree(£, Q, Tree); 
ComputeSupportNetwork( Tree, TV); 
N < — Prune(./V); 

Algorithm 1: SupportNetwork(.B, Q, N) 



7. Examples of Bayesian Logic Programs 

In this section, we illustrate the representational power and elegance of 
Bayesian logic programs by demonstrating that Bayesian network, definite clause 
programs (as in "pure" Prolog), hidden Markov models and dynamic Bayesian 
networks can straightforwardly be encoded as Bayesian logic programs. Further- 
more, we give examples of Bayesian logic programs involving structured terms 
(and having a countably infinite least Herbrand model). But before doing so, let us 
put Bayesian logic programs in a wider context. The proof of Theorem 4.9 makes 
a more abstract interpretation of Bayesian logic programs possible. Given a total 
order on the least Herbrand model a Bayesian logic program represents a discrete- 
time stochastic processes (cf. the appendix). If we see the total order as a time 
line, then the state of a single proper random variable could depend on all vari- 
ables in the past. Such processes are called infinite-memory This places Bayesian 
logic programs in a wider context of what Cowell et. al. call highly structured 
stochastic systems (cf. [CDLS99]). Well-known probabilistic frameworks such as 
dynamic Bayesian networks, hidden Markov models or Kalman filters are special 
cases of such highly structured stochastic systems. 
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Data : Tree, an SLD tree. 
Result: N, a support network. 

foreach successful path P £ Tree where e±,...,e n are the edges of P 
starting at the root node do 

6 <— composition of 0(ei), . . . , 0(e n ); 
for % = n down to 1 do 

insert node head(C(ei)@) into N; 
if C(ej)0 is not a ground fact then 

foreach ground atom b £ body(C(ei)Q) do 
insert node b into N; 

insert an edge from b to head(C(ei)@) into N; 

end 
end 

Store cpd(C(ei)) at node head(C(ei)Q) ; 

end 
end 

Apply to each node in N over a predicate r the corresponding combin- 
ing rule comb(r); 

/* Note that this may delete some edges in iV */; 

Algorithm 2: ComputeSupportNetwork( Tree, N) 

7.1. Bayesian networks 

Section 3.1 has already shown that every Bayesian network directly trans- 
lates to a propositional Bayesian logic program. Here, we give a further illustra- 
tion. Consider the famous example due to Judea Pearl about burglary alarms at 
home (cf. e.g. [RN95]). It translates to 

burglary . 
earthquake . 

alarm | burglary, earthquake, 

johncalls | alarm, 
marycalls | alarm. 

where the associated conditional probability densities are identical to the densities 
of the Bayesian network. 

7.2. Definite Clause Logic 

Another interesting subclass of Bayesian logic programs are definite clause 
programs, i.e. "pure" Prolog programs. They give us the power of "pure" Prolog to 
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Table 2 

The conditional probability densities associated to a logical Bayesian clause A | Ai, . . . , A n . 

model deterministic knowledge within the framework of Bayesian logic programs. 
Let us start with a simple example. The logic program parents 

f ather (jef ,paul) . 
mother (an, paul) . 

parent(X,Y) :- father (X,Y) . 

parent (X,Y) :- mother (X,Y) . 

defines the parent relation in terms of father and mother. 

Example 7.1. We model this with the Bayesian logic program 

father (jef ,paul) . 

mother (an, paul) . 

parent (X,Y) | f ather (X, Y) . 

parent (X,Y) | mother (X, Y) . 

Each predicate has as domain {true, false} and max as combining rule. To 
each clause C we associate cpd(C) as in Table 2. It is easy to compute 
p(father(jef paul) = true) = 1.0 and p(parent(jef paul) = false) = 0.0. □ 

Choosing other associated conditional probability densities one could easily im- 
plement some forms of negation such as explicit negation. The example generalizes 
to the following theorem. 

Theorem 7.2. Let L be a definite clause program, such that the solution graph 
of each ground atom in LH(L) is finite. Let B be the Bayesian logic program which 
is the result of applying the transformation of example 7.1 on L. Furthermore, let 
G be a Bayesian ground atom. Then, B specifies P{G = true) = 1.0 if and only 
if the logical ground atom of G G LH(L). 

Proof. Let L be a definite clause program, such that the solution graph of each 
ground atom in LH(L) is finite, and B the Bayesian logic program which is the 
result of applying the transformation of example 7.1 on L. 

"=>■": The program B only specifies a density over G when G G LH(S). Because 
L is the corresponding logic program of B, G G LH(L) holds. 
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"-<=": Let G G LH(L). Because L is the corresponding logic program of B, it 
follows G G LH(i3). Furthermore, the solution graph of G represents all possible 
proofs of G. An induction over the number of nodes of the solution graph utilizing 
the max combining rule proves P(G = true) = 1.0. □ 

The restrictions of theorem 7.2 are quite strong. First, the logic programs must 
fullfill the conditions of a well-defined Bayesian logic program, i.e. the solution 
graph of any provable ground atom must be finite. Second, the answer to a 
probabilistic query of ground atoms not occurring in the least Herbrand model 
is undefined. E.g. the parents logic program does not entails parent (jo, henry). 
The solution graph of parent (jo , henry) is empty. In the second case, we refer 
to Remark 5.3 and theorem 7.2. They justify to interpret this "null-value" as 
P(Q = false) = 1.0. But still the first restriction is very strong. Many logic 
programs (see e.g. example 4.10) do not fit theorem 7.2. Some provable ground 
atoms have infinitely many proofs. We stipulate in theorem 4.9 a finite branching 
factor of the solution graphs by virtue of finite computations, especially of the 
combining rules. But if we modify the definition 3.6 of combining rules to allow 
for infinite input sets, then the declarative semantics of Bayesian logic programs 
could be extended to cover logic programs having solution graphs of finite depth 
but infinite branching factors (see example 4.10). Under the modified declarative 
semantics we could formulate the following theorem: 

Theorem 7.3. Let L be a definite clause program and B the corresponding 
logical Bayesian logic program. Then, B specifies P{G = true) = 1.0, G G LH(B), 
if and only if the corresponding logical ground atom of G is a member o/LH(L). 

This gives us the possibility to effectively model logical knowledge within Bayesian 
logic programs. 

7.3. Structured terms 

Consider the following well-defined Bayesian logic program (the associated 
conditional probability densities are not important): 

even(0) . 

even(s (X)) | odd(X) . 
odd(s(X)) | even(X). 

It is easy to see that we can compute the answer to any probabilistic queries, such 
as odd(s(0)) | even(s(s(0))) = true. We have to compute the support network, 
which is even(0) — > odd(s(0)) — > even(s(s(0))), feed it into a Bayesian network 
engine and compute the answer. One can easily see that despite the presence 
of structured terms and an infinite number of random variables, the required 
computations are finite. 
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Figure 12. The generic structure of a dynamic Bayesian network (see [KKR95]). The associated 
conditional probability densities quantifying the transition probabilities between states are called 
the state evolution model and the conditional probability densities describing the observations 
which can result from a state form the sensor model. 



7.4- Dynamic Bayesian networks 

A framework covering both Bayesian networks and hidden Markov mod- 
els is that of dynamic Bayesian networks [Kja395]. The fundamental idea is to 
divide time into time slices, each representing a snapshot of the evolving tem- 
poral process. These states of the world at a specific time point are described 
using Bayesian subnetworks over a finite sets of random variables. According to 
Kanazawa et. al. [KKR95] "a dynamic Bayesian networks consists of a sequence 
of time slices where nodes within time slice t are connected to nodes in time 
slice t + 1 as well as to other nodes within slice t." Hence, each time slice t is 
a Bayesian subnetwork. The generic structure of a dynamic Bayesian network is 
shown in Figure 12. If one assumes that the associated conditional probability 
densities do not vary over time (a common assumption, see e.g. [KKR95]), one 
can show that dynamic Bayesian networks can be represented using Bayesian 
logic programs by combining the ideas of Section 7.1 and Section 7.3. We encode 
the starting Bayesian network of time slice using Bayesian atoms having in 
the last argument. The connection between a time slice t and a time slice t+ 
are modeled using clauses involving the term succ(T) in the head and the term 
T in the body. Therefore, Bayesian logic programs clearly generalize dynamic 
Bayesian networks. However, the structure expressible with Bayesian logic pro- 
grams is much more flexible than that of dynamic Bayesian networks. E.g. the 
structure or the random variables of the time slices can vary over time. Hence, 
Bayesian logic programs could better represent a particular situation in time. 
For a discussion on why this is important we refer to the work of Glesner and 
Roller [GK95]. 
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8. Related Work 



Bayesian logic programs are related to all combinations of first order logic 
with probability theory. However, we will focus on first order extensions of 
Bayesian networks only. Other works such as the one by Ng and Subrahma- 
nian [NS92], who introduced a probabilistic characterization of logic program- 
ming, by Sato [Sat95], or by Cussens and Muggleton on stochastic logic programs 
[Mug96,Cus99b] will not be treated in this paper. We refer for surveys to Par- 
son's article [Par96] and Section 3 of Cussen's paper [Cus99a]. This is in line with 
Halpern's [Hal89] analysis of first order logics of probability. Halpern introduced 
two probabilistic structures. A structure of type I represents the degree of belief 
of an agent. Bayesian networks and Bayesian logic programs are examples of this 
type of structure. A structure of type //represents statistical knowledge. One can 
e.g. express the probability that a randomly chosen object has some property. 
Stochastic logic programs are an example of this second type of structure. In 
contrast to Bayesian logic programs, probabilities in stochastic logic programs 
are defined directly on the proofs of atomic formulas. 

Bayesian logic programs are motivated and inspired by the formalisms 
discussed in [Poo93,Had94,NH97,Jae97,FGKP99,Kol99]. They are most closely 
related to Ngo and Haddawy's knowledge-based model construction framework 
of probabilistic logic programs [NH97] (cf. Section 8.1). The idea of associat- 
ing conditional probability densities to clauses is also proposed by Fabian and 
Lambert [FL98], though they view a ground atom as a random variable over 
{true, false} and give a quite different declarative semantics. E.g. they do not 
have the concept of a combining rule but instead use a kind of backtracking 
mechanism to cope with the situation where one atom can be proven in different 
ways. 

We will now more closely investigate the relationship of Bayesian logic pro- 
grams to Ngo and Haddawy's probabilistic logic programs [NH97], Pooles's prob- 
abilistic Horn abduction [Poo93], Koller et. al.'s probabilistic relational mod- 
els [Kol99,FGKP99] and Jaeger's relational Bayesian networks [Jae97]. 



8.1. Probabilistic logic programs 



Probabilistic logic programs [NH95,NH97] follow the knowledge-based model 
construction technique and adapt as Bayesian logic programs the concept of least 
Herbrand model to specify the relevant random variables. A query-answering 
procedure exists and is based on SLD resolution. An example probabilistic logic 
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Figure 13. The (grounded) SLD trees built to compute p(burglary(j ames)) within probabilistic 
logic programs. The dotted edges indicates that the two SLD trees are computed in order to 

compute p(burglary(j ames)). 



program (following [NH97]) is 

P(neighbourhood(X, average)) = 0.4 
P(neighbourhood(X, good)) = 0.3 

P(burglary(X, yes) \ neighbourhood(X , average)) = 0.4 
P (burglary (X, no) \ neighbourhood(X, good)) = 0.7 

It consists of four so-called probabilistic sentences. Each such sentence quantifies a 
probabilistic dependency among random variables, e.g. the aposteriori probability 
of a burglary in the house of person X given that X has a good neighbourhood is 
0.7. Even this simple example shows five main differences between probabilistic 
logic-programs and Bayesian logic programs. 

First, in probabilistic logic programs ground atoms g(t±, . . . , t n _i, t n ) 
correspond to states t n of random variables g(t±, . . . , i n -i). When consid- 
ering models over probabilistic logic programs one must guarantee that 
each random variable has at most one value, Therefore, Ngo and Had- 
dawy need to introduce so called exclusivity constraints, such as <— 
neighbourhood^, average), neighbour hood(X, bad). These are unnecessary for 
Bayesian logic programs. 

Secondly, Ngo and Haddawy employ an inference procedure that is exponen- 
tially slower than ours to construct the knowledge base. This is best illustrated 
on an example. Consider computing the density p(burglary( j ames)). Ngo and 
Haddawy's inference engine would construct one successful path for each possi- 
ble value of all possible random variables influencing burglaryijames) , as shown 
in Figure 13. In contrast, using Bayesian logic programs we would consider only 
one successful path, burglary (j ames) <— neighbourhood^ ames). For this simple 
example , this is a reduction by a factor 4. It is easy to show that, if we assume 
binary domains, Bayesian logic programs are exponentially more efficient. 

Thirdly, in the probability models of Ngo and Haddawy, " each random vari- 
able can assume a value from a finite sef [NH97, page 149], i.e. that no (infinite) 



38 



discrete or continuous random variables can be considered. Their inference pro- 
cedure cannot cope with such variables. Furthermore, it is not entirely clear to 
what extent Ngo and Haddawy deal with function symbols 9 

Fourth, in probabilistic logic programs it is possible to employ partially 
defined associated densities (as in the example above), i.e. some entries in the 
conditional probability densities are undefined. Though Bayesian logic programs 
could - in principle - be extended to allow for such partially defined densities 10 , 
we prefer not to do so. Reasons for this are that it complicates the notation and 
also that it is unclear whether there are any advantages of using such partially 
defined densities, cf. the ongoing discussion in the literature (e.g. [NH97,Jae98]). 

Fifth, probabilistic logic programs are much more complex than Bayesian 
logic programs. They mix the qualitative information, the logical component with 
the quantitative information, whereas in Bayesian logic programs this informa- 
tion is - as in Bayesian networks - nicely separated. This separation of the two 
components is often considered one of the most important advantages of Bayesian 
networks. 

A further extention of probabilistic logic programs is the use of context 
information. Context information is used to filter away sentences that do not 
apply to the current query from the knowledge base. Consider e.g. the following 
program (inspired by [NH97]) 

P(neighbourhood(X, bad)) = 0.2 <— livesJ,n{X, yorkshire) 
P(neighbourhood(X, bad)) = 0.4 <— lives_in(X, Vienna) 

It states that we have different conditional probabilities depending on whether 
lives-in(x, yorkshire) or lives-in(x, Vienna) is true or false, given some external 
logic program 11 . Whereas context information may be important for efficiency 
reasons, we believe it is more natural to view this information as deterministic 
knowledge that can be specified using a "pure" Prolog program (a subset of 
Bayesian logic programs) although the filter process used in [NH97] can easily be 
incorporated into our framework. Using this approach, the previous probabilistic 
logic programs would be written as the following Bayesian logic program 

neighbourhood (X) | lives_in(X,yorkshire) 
neighbourhood (X) | lives_in(X, Vienna) 

where we assume the existence of some "pure" Prolog clauses defining lives_in. 

9 [NH97] contains a theorem without proof about countable infinite "least Herbrand domains" 
(i.e. RAS). However, in their main paper [NH97], Ngo and Haddawy define the semantics only 
for the case that the "least Herbrand domain" is finite. 

10 Alternatively, we could handle partially defined conditional probability densities using the 
combining rules. 

11 The external logic programming may employ negation, i.e. it is a so called normal logic pro- 
gram. We cannot deal with negation as failure or completion semantics but we could use 
Bayesian logic programs to define negated predicates explicitly. 



39 



father 








blood-test 


: ( 





contaminated, 



test- of 1 



Figure 14. Probabilistic relational model of a genetic domain. We use the standard graphical 
notation of entity/relationship models: ovals represent attributes and boxes entities. Dashed 
lines indicate aggregations as parents, solid ones indicate attributes as parents. 



To summarize, Bayesian and probabilistic logic programs are strongly- 
related. However, Bayesian logic programs are simpler, more natural, more 
efficient and more expressive. They are simpler as they are based on less concepts 
and notation; they are more natural as they clearly separate the logical from 
the probabilistic component; and they are more expressive as they can represent 
functors and continuous variables. 



Probabilistic and Bayesian logic programs are also related to Poole's 
framework of probabilistic Horn abduction [Poo93], which is "a pragmatically- 
motivated simple logic formulation that includes definite clauses and probabilities 
over hypotheses" [Poo93]. Poole's framework provides a link to abduction and 
assumption-based reasoning. However, as Ngo and Haddawy point out, proba- 
bilistic and therefore also Bayesian logic programs have not as many constraints 
on the representation language, represent probabilistic dependencies directly 
rather than indirectly, have a richer representational power, and their indepen- 
dency assumption reflects the causality of the domain. 



8.2. Probabilistic relational models 



Koller et. al. [FGKP99,Kol99] define probabilistic relational models, which 
are based on the well-known entity/relationship model. Figure 14 shows an exam- 
ple from [FGKP99]: "it is a genetic model of the inheritance of a single gene that 
determines a person's blood type. Each person has two copies of the chromosome 
containing this gene, one inherited from her mother, and one inherited from her 
father. There is also a possibly contaminated test that attempts to recognize the 
person's blood type." In probabilistic relational models, the random variables are 
the attributes. The relations between entities are deterministic, i.e. they are only 
true or false. To represent this within Bayesian logic programs Koller et al. use 
the following normal form: each attribute a of an entity type E is a Bayesian 
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predicate a(E) and each n-ary relation r is a n-ary logical Bayesian predicate 
n. Probabilistic relational models consist of a qualitative dependency structure 
over the attributes and their associated quantitative parameters (the conditional 
probability densities). Koller et. al. distinguish among two types of parents of an 
attribute. First, an attribute a{X) can depend on another attribute b(X), e.g. 
the blood type of a person depends on the chromosome inherited from its father 
(j) — chrom). This is equivalent to the Bayesian clause a(X) | b(X). Second, an 
attribute a{X) possibly depends on an attribute b(Y) of an entity Y related to 
X, e.g. the chromosomes of a person depends on the chromosomes of its mother 
(m — chrom). The relation between X and Y is described by a slot s(X, Y) which 
is either a projection of a relation, i.e. s(X,Y) :- r(Xi, . . . , X n ), or a composi- 
tion of slots, i.e. s(X,Y) :- si(X, Xi), s 2 (Xi, X 2 ), . . . , s m (X m _i, Y). Given these logi- 
cal Bayesian clauses, the original dependency is represented by a(X) | s(X, Y), b(Y). 
Thus, the example in Figure 14 can be represented as the Bayesian logic program 

m_chrom(X) | mother (X,Y) , p_chrom(Y) , m_chrom(Y) . 

p_chrom(X) | father(X,Y) , p_chrom(Y) , m_chrom(Y) . 

blood_type (X) | m_ chrom (X) , p_chrom(X) . 
contaminated (X) | blood_test (X) . 

result(X) | test_of (X,Y) , contaminated (X) , blood_type (Y) . 

One original feature of probabilistic relational models concerns the way they deal 
with multiple instantiations of a single clause. To this purpose Bayesian logic 
programs employ combining rules. Probabilistic relational models however use 
aggregate functions (as in database languages) to map multiple values of an at- 
tribute onto a single value. E.g. consider the above clause for m_chrom and assume 
that there are multiple possible values for Y. Then the probabilistic relational 
model would apply an aggregate function to these values and specify a proba- 
bility density conditioned over the derived attributes. If one would desire to use 
aggregate functions within Bayesian logic programs one has to simulate this cal- 
culation by moving the aggregate function inside the combining rules. The input 
of a combining rule is a set of conditional probability densities specifying partial 
influences. Indeed, the associated conditional probability densities of a Bayesian 
logic program list the involved random variables as well as their values, so that 
the combining rule could implicitly employ the aggregate functions to realize the 
same effects. It has to compute the aggregation for all possible states of involved 
random variables. 

At this point it should be clear that probabilistic relational models employ a 
more restricted logical component than Bayesian logic programs do: The compo- 
nent is a restricted version of the commonly used entity /relationship model, where 
relations have attributes, and any entity /relationship model can be represented 
using a (range-restricted) definite clause logic. Finally, let us point at one diffi- 
culty is introduced by the concept of slot chains, as slots are by definition binary 
relations. It is well-known from database theory that — in general — a ternary 



41 



relation cannot be represented using binary relations only, (see e.g. [EN94]). Sup- 
pose, for instance that we have the ternary relation beer -drinker 12 



beer -drinker 



pub 


guest 


beer 


Kowalski 


Kemper 


Pils 


Kowalski 


Eickler 


Hefeweizen 


Innsteg 


Kemper 


Hefeweizen 



The slot chain beer-drinker(pub, guest) o beer-drinker(guest, beer) yields the rela- 
tion 

beer_drinker(pub, guest) o beer-drinker(guest, beer) 



pub 


guest 


beer 


Kowalski 


Kemper 


Pils 


Kowalski 


Kemper 


Hefeweizen 


Kowalski 


Eickler 


Hefeweizen 


Innsteg 


Kemper 


Pils 


Innsteg 


Kemper 


Hefeweizen 



where spurious tuples bearing wrong information are introduced. This will also 
be the case for any other projection of the ternary relation. As a consequence, 
it is unclear how to handle such ternary relations using probabilistic relational 
models. 



8.3. Relational Bayesian networks 



Jaeger [Jae97] considers Bayesian networks where the nodes are predicate 
symbols. The states of these random variables are possible interpretations of 
the symbols over an arbitrary, finite domain (here we only consider Herbrand 
domains), i.e. the random variables are set-valued. On the other hand, the infer- 
ence problem addressed by Jaeger does not ask for the probability of a ground 
atom belonging to any specific interpretation, but only for the probability that 
an interpretation contains that ground atom. Under these conditions, relational 
Bayesian networks are viewed as Bayesian networks where the nodes are the 
ground atoms and all random variables have the domain {true, false} 13 . The key 
difference between relational Bayesian networks and Bayesian logic programs is 
that the quantitative information is specified by so called probability formulas. 
These formulas employ the notion of combination functions, functions that map 
every finite multiset with elements from [0, 1] into [0, 1], as well as that of equality 

12 The example is taken from [KE97, page 153]. 

L3 It is possible, but complicated to model domains having more than two values. 
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constraints 14 . Let F cancer (x) be 

noisy-or{combr{exposed(x,y, z) \ z;true} \ y;true} 

This formula states that that for any specific organ y, multiple exposures to 
radiation have a cumulative effect on the risk of developing cancer of y. But 
developing cancer at any of the various organs y can be viewed as independent 
causes. As shown in [Jae97] a probability formula not only specifies the densities 
but also the dependency structure. Because of this and the computational power 
of combining rules, a probability formula is easily expressed as a set of Bayesian 
clauses: the head of the Bayesian clauses is the corresponding Bayesian atom and 
the bodies consist of all maximally generalized Bayesian atoms occurring in the 
probability formula. Now the combining rule can select the right ground atoms 
and simulate the probability formula on them. This is always possible because the 
Herbrand base is finite. E.g. the clause cancer (X) I exposed(X,Y,Z) together 
with the right combining rule and associated conditional probability densities 
models the example formula. We refer for a more detailed discussion to [KerOO]. 

9. Conclusion 

We have introduced Bayesian logic programs, their representation language, 
their declarative semantics and their query-answering procedure. Bayesian logic 
programs are a novel framework for combining Bayesian networks with definite 
clause logic. The main idea of Bayesian logic programs is to establish a one-to- 
one mapping between true (logical) ground atoms and random variables. This 
ensures a unique probability density over the random variables corresponding to 
the least Herbrand model. The least Herbrand model of a Bayesian logic pro- 
gram together with its direct influence relation is interpretable as a (possibly in- 
finite) Bayesian network where the parents of a random variable X are (Bayesian) 
ground atoms directly influencing X. The query- answering procedure adapts the 
two phase strategy of knowledge-based model construction methods: (1) construct 
the support network, and (2) apply a Bayesian network (exact or approximating) 
inference algorithm on the support network. The procedure is based on AND/OR 
graphs which allow to intuitively combine the logical and the probabilistic infor- 
mation. It turned out that even if the least Herbrand model is infinite every 
well-defined probabilistic query is computable. 

We also argued that Bayesian logic programs inherit the advantages of both 
Bayesian networks and definite clause logic, including the strict separation of 
qualitative and quantitative aspects. Indeed, Bayesian logic programs can natu- 
rally model any type of Bayesian network (including those involving continuous 
variables) as well as any type of "pure" Prolog program (including those involving 

14 To simplify the discussion, we will further ignore these equality constraints here. For details 
we refer to [KerOO]. 
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functors). Therefore Bayesian logic programs should be easy to use and under- 
stand by people familiar with the underlying representations. We also demon- 
strated that Bayesian logic programs can model dynamic Bayesian networks and 
hidden Markov models and investigated their relationship to other first order 
extensions of Bayesian networks. Finally, a simple Prolog meta-interpreter was 
presented that performs knowledge base construction for Bayesian logic programs. 

In the future, we plan to apply Bayesian logic programs to some real-world 
problems. Also, the learning of Bayesian logic programs will be investigated. We 
believe that the strict separation property will also be advantageous in this re- 
spect as inductive logic programming techniques [MD94] could be applied to learn 
the structure of the Bayesian logic program and probabilistic Bayesian network 
techniques could be applied to determine the parameters of the network. Some 
preliminary suggestions concerning learning can be found in [KDKOO]. Further 
work may also be concerned with more efficient inference algorithms, along the 
lines of e.g structured inference [Pfe99] or [KMP97]. Finally, the relation be- 
tween Bayesian logic programs and other probabilistic first order logics, such as 
Muggleton's stochastic logic programs [Mug96], could be investigated. This ques- 
tion seems to be related to the difference between Halpern's type I and type II 
probabilistic structures [Hal89]. 
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Appendix 

A. The Mathematical Background of the Proofs of Theorem 4.9 and 
Theorem 5.3 

Here we introduce the concepts of probability theory needed to prove Theo- 
rem 4.9 and Theorem 5.3. More information can be found e.g. in [Bau91,Bau92, 
FG97]. 

A system A of subsets of a set Q is a a-algebra (in f2) if it exhibits the 
following features: (1) fi G A; (2) if A G A — > A € A, where A denotes the 
complementary set of A; (3) for each sequence (A n ), A n 6 A, the proposition 
UnLi An £ A holds. E.g. the power set of a given set is a er-algebra. A probability 
space is a tuple (fi, A, P) where f2 is a set, A is a a-algebra in 17 and P : Q t— > [0, 1] 
a probability measure. A probability measure P (over (ft, A)) satisfies (1) -P(0) = 0, 
(2) for each sequence (A n ) of pairwise disjunctive sets in A, whose union lies in A, 
the equation P({Jn+i A n ) = J2n=i p i A n) holds and (3) P(fi) = 1. One can show 
that for all A G A : < P(A) < 1. The elements of Q are called the elemental 
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events, and the elements of A are called events; the empty set is the impossible 
and f2 the certain event. The value P (A) for A G A is called the probability of A. A 
pair of a set and a cr-algebra over that set is called a measurable space, e.g. (ft, A) 
is a measurable space. A mapping T : — > f2' where (fi,.A) and (fi',,/4') are 
measurable spaces, is called (.4 — A') -measurable if for all A' G .4: T" 1 (A / ) G A. 
Let be a measurable space. A random variable X with domain D(A) = 0' 

is a (4. — -4')-measurable function X : Q. i— > f2'. The distribution of X is the 
image measure X(P). In the case of = (R, S 1 ) we call X a real random 

variable where /3 1 is the minimal cr-algebra which contains all half-open intervals 
[a, b) C R. Let / be a countable index set, (x n ) ne j a sequence in R and (a n )„ e N 
a sequence of non-negative real numbers with Y^Li a n = 1. A discrete random 
variable is a real random variable having P = X^Li a n^x„ as distribution where 
e x n {A) = 1 if x n G ^4 and e x „(.A) = if x n 4. In the case of / being finite X 
would be called a finite random variable. 

Let T be a countable index set. A real discrete-time stochastic process S is 
a tuple S = (f2, A, P, (X t )t<=T) where (CI, A, P) is a probability space and (X t ) t£ T 
is a family of real random variables over this probability space . Let J C T be 
finite. The distribution of (X t ) te j, denoted as Pj is defined to be Xj(P) where 
Xj is the product mapping Xj := ® ieJ Ii (cf. see [Bau91]). Let H(T) be the set 
of all non-empty, finite subsets of T. The family (Pj)jeH(T) is called the family 
of finite- dimensional distributions of S. Kolmogorov's theorem reads with respect 
to real discrete-time stochastic processes as follows: 

Theorem A.l (Kolmogorov). Let T be a non-empty, countable set and 
(M" 7 , 1Z J , Pj) jg-ft(T) o, family of probability spaces. If the family is projective, 
then then there exists a unique probability measure Pt over (R T ,£ T ) with 
projj(Pj) = Pj for all J G 7i(T). The family (R J , 1Z J , Pj) j e n(T) *s called pro- 
jective if for all H,J G Tt(T) with H C J the equation Pj = projj (Ph) holds, 
where projj is the projection o/R^ onto R J . 

If we interpret the index set T as time steps, then "the preceding result [...] says 
that the distribution of (X t )t<=T can be specified by giving the 'initial distribution' 
(the distribution of Xq) and, for each time n, giving the conditional distribution 
of the 'present' value X n in term of the 'past' values Xq, . . . , X n -\ [FG97, page 



In the proofs of Theorem 4.9 and of Theorem 5.3 we will use one of the key 
features of Bayesian networks. An induction over the size of a Bayesian network 
N over the real random variables X\, . . . ,X\, I > shows that N specifies a 
projective family of probability measures. Thus, it holds 



433]. 




where JcHcJV and D = H \ J. 
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B. Interpreting Bayesian Logic Programs using Prolog - BLoP 

Here we present a rudimentary interpreter of Bayesian logic programs in 
Prolog. It is intended to get a "playfully" understanding of Bayesian logic pro- 
grams. We would like to point out that the interpreter is by no means optimal. 
It is just a simple and direct implementation of the query-answering procedure. 
Therefore, we have neither implemented pruning nor optimization mechanisms 
in constructing the SLD tree. 

The interpreter resembles the first phase of the query-answering procedure 
in Section 6.2: It builds the support network with respect to a given Bayesian 
logic program and a well-defined probabilistic query. The support network is 
specified in the Hugin net file language (see [Hugb,Hug01]) and written to the file 
'out.net'. It can directly be loaded into the HUGIN system (a demo version of 
Hugin can be downloaded from www. hugin. com), although the file out .net itself 
is well readable for humans. The heart of the interpreter (prove/3) is a common 
Prolog meta interpreter as one can find in every standard introductory text book 
on Prolog (e.g. [SS86,Bra86,Fla94]). The code was developed under Sicstus 3.8.1. 
It is best explained using an example. 

Assume, we want to answer the query ?- height (f red) | 
height (ann) =159 to the Bayesian logic program height, i.e. we need the 
support network N (height(henry) , height(irene)) . The height program can be 
formulated as follows: 

1 domain(f ather/2 , discrete, [true, false]). 

2 domain (mother/2, discrete, [true, false]). 

3 domain(height/l , continuous, real) . 

We have two Bayesian predicates father/2 and mother /2 having both the 
(finitely) discrete domain {true, false}. Furthermore, height/ 1 is a Bayesian pred- 
icate with a continuous domain, the real numbers. We use the usual Prolog no- 
tation p/N to refer to a iV-ary predicate p. 

4 combining_rule (father/2 , id) . 

5 combining_rule (mother/2 , id) . 

6 combining_rule (height/1 , id) . 

The associated combining rules are the identity. Other combining rules 
should be easily defined (see below). The qualitative aspects of the height do- 
main are specified as follows: 

7 father (unknownl , f red) . mother(ann, f red) . father (brian, dorothy) . 

8 mother(ann, dorothy) . father (brian, eric) . mother(cecily,eric) . 

9 father (unknown2,gwenn) . mother (ann, gwenn) . father(fred, henry) . 

10 mother (dorothy , henry) . father (eric , irene) . mother(gwenn,irene) . 

11 father (henry ,john) . mother ( irene, john) . 

12 

13 height(ann). height (brian) . height (cecily) . 

14 height (unknownl) . height (unknown2) . 

15 

16 height(X) I mother(Y,X) ,father(Z,X) .height (Y) ,height(Z) . 
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The associated conditional probability densities are specified using cpd/2. 
The first argument is the corresponding Bayesian clause. The second argument are 
the densities. (Finitely) discrete densities are represented as a list of prolog terms, 
whereas conditional Gaussian densities are represented by a list of ground atoms 
of the form normal (m, v) . The terms m and v refer to the mean and the variance 
of a Gaussian density. The notation is based upon the HUGIN net specification 
language and we refer for a discussion to [Hugb,Hug01]. We believe it should be 
easy to extend the supported densities with respect to the HUGIN net language. 

17 cpd(father(unknownl,fred) , [1.0,0.0]) . cpd (mother (ann, f red) , [1 . 0,0 . 0] ) . 

18 cpd(f ather (brian,dorothy) , [1 .0,0 . 0] ) . cpd (mother (aim, dorothy) , [1 . ,0 . 0] ) . 

19 cpd(father(br i an, eric) , [1.0,0.0]) . cpd (mother (cecily , eric) , [1.0,0.0]) . 

20 cpd(f ather (unknown2,gwenn) , [1.0,0.0]) . cpd (mother (ann, gwenn) ,[1.0,0.0]). 

21 cpd(father(fred, henry) , [1.0,0.0]) . cpd (mother (dorothy , henry) , [1.0,0.0]) . 

22 cpd(f ather (eric, irene) , [1.0,0.0]) . cpd (mother (gwenn, irene) ,[1.0,0.0]). 

23 cpd(f ather (henry , john) , [1.0,0.0]) . cpd (mother (irene , john) , [1.0,0.0]) . 

24 

25 cpd (height (ann) , [normal (165, 60) ] ) . cpd (height (brian) , [normal (165, 60) ] ) . 

26 cpd (height (cecily) , [normal (165, 60) ] ) . cpd (height (unknownl) , [normal (165, 60)] ) . 

27 cpd (height (unknown2) , [normal (165, 60)] ) . 

28 

29 cpd( (height (X) I mother (Y,X) .father (Z,X) .height (Y) .height (Z)) , 

30 [normal (0.5*height(Y)+0.5*height(Z) ,0) , 

3 1 normal ( 165 , 60) , normal ( 165 , 60) , normal ( 165 , 120) ] ) . 

We will now describe the core of the BLoP interpreter. The BLoP inter- 
preter could be started using blop_shell. A little shell for computing support 
networks is opened. First, we have to consult the height Bayesian logic program, 
['height .blp'] . Then, the support network N (height(fred) , height(ann)) is com- 
puted by typing height (f red) | height (ann) =155 . at the shell prompt. The 
algorithmic skeleton (see algorithm 6.2) is realized by query/2. 

1 query (Q,E) :- !, 

2 retractall (proof (_)) , retractall (network (_)) , 

3 assert (proof ([])) , 

4 evidence_variables(E,EVars) , append(Q ,EVars , Vars) , 

5 solution_graph(Vars,S) , 

6 support_network(S,SNC) , 

7 apply_comb(SNC,SN) , asserta(network(SN) ) , 

8 write_support_network(SN) . 

9 

10 evidence_variables( [],[]). 

11 evidence_variables( [V=_ I Vs] , [V| Vars] ) :- 

12 evidence_variables(Vs,Vars) . 

After extracting the random variables occurring in the evidence, 
evidence_variables (E,EVars) , query/2 computes the solution graph S for the 
variables Vars calling solution_graph(Vars , S) . 

13 prove (true, P, P) :- !. 

14 prove ((GA.GB), OP, NP) :- !, 

15 prove (GA, OP, PA) , prove (GB, PA, NP) . 

16 prove (G, OP, [G-B|P]) :- 

17 clause(G.B), prove (B, OP, P) . 
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18 

19 solution_graph([G|Gs] ,S) :- 

20 prove (G, [] ,P) , assertz (proof (P) ) , fail; 

21 solution_graph(Gs,S) ; 

22 assertz (proof (end)) , collect (US), sort (US, S). 

23 

24 collect (L) :- 

25 retract (proof (X)) ,! , 

26 (X == end, ! ,L=[] ; 

27 collect (R), append(X,R,L)) . 

The Prolog procedure solution^graph/2 works similar to findall/3 as 
denned in [Bra86] . The output S is a lexicographically ordered Prolog list of edges 
A-B from an or node A to an and node B in the solution graph of Vars. To compute 
the successful paths we have adapted a depth first searching meta interpreter 
prove/3 as described in [SS86]. It returns the list of successful paths in the last 
argument. Having the solution graph S, the Prolog procedure support_network/2 
looks up the associated conditional probability densities of each clause used to 
build S: 

28 support .network ([],[]) . 

29 support_network([N-Pa|Es] , [(N,T,D, [Parents I Pas] , [CPDlCPDs]) |SN]) :- 

30 functor(N,P,NA) , domain (P/NA, T, D) , 

31 cpd(N, Pa, Parents, CPD) , 

32 support_network(N,Es, Pas, CPDs, Rest) , 

33 support_network(Rest,SN) . 

34 support .network (_, [],[],□,□). 

35 support_network(N, [N-Pa|R] , [Parents I Pas] , [CPDlCPDs] , Rest) :- 

36 cpd(N, Pa, Parents, CPD) , 

37 support_network(N,R, Pas, CPDs, Rest) . 

38 support .network (_, R, [],[] ,R) . 

The last step is to compute the combined conditional probability densities. 
This is done using apply_comb/2: 

39 apply_comb( [],[]). 

40 apply_comb([F|Fs] , [CFlCFs]) :- 

41 combine (F, CF) , 

42 apply_comb(Fs,CFs) . 
43 

44 combine((N,R,T,D,LPa,LCPDs) , (N,R,T,D,Pa,CPD) ) :- 

45 functor(N,P,NA) , 

46 clause (combining_rule(P/NA,CR) ,_) , 

47 C= . . [CR , LPa , LCPDs , Pa , CPD] , 

48 call(C). 

The corresponding combining rule of the Bayesian predicate P is 
called by call(C). A combining rule cr is seen as a Prolog procedure 
cr(PaL,CPDL,Pa,CPD). The PaL argument is the given list of set of parents. 
The CPDL argument specifies the corresponding conditional probability densities. 
Thus, the i-th element of CPDL are the conditional probability densities associ- 
ated to the i-th parent set in PaL. The resulting set of parents and the combined 
conditional probability densities are returned in Pa and CPD. 
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After performing all steps, the support network N (height(fred) , height(ann)) 
is given in the variable SN. It will be written out to the file out.net. The com- 
puted support network N (height(fred) , height(ann)) equals the support network 
N (height(fred)) shown Figure 7). The rest of the code implements the shell and 
input/output operation. Because the way it works gives no insights in the inter- 
preter, we will not explain it. 



49 y. 

50 :- use_module(library(lists) ) . 

51 :- use_module(library(charsio) ) . 

52 :- op(500,xfy, ' I ') • 

53 :- dynamic domain/3, combining_rule/2 , network/1. 

54 

55 •/. 

56 term_expansion((Head|Bodyl,Body2) , (Head: -Body 1 ,Body2) ) . 

57 term_expansion( (Head I Body) , (Head : -Body) ) . 

58 term_expansion(domain(A/N, T,D) , ( : - dynamic A/N) ) :- 

59 retractall(domain(A/N,_,_)) , assert (domain (A/N , T , D) ) . 

60 term_expansion(combining_rule(A/N,B) , ( :- dynamic A/N)) :- 

61 retractall (combining_rule (A/N , _) ) , assert (combining_rule (A/N,B) ) . 

62 term_expansion(cpd((H|B) ,CPD) , cpd(H,B,BL,CPD) ) :- 

63 conj _2_list (B , BL) . 

64 term_expansion(cpd(H,CPD) ,cpd(H,true, [] ,CPD)) :- !. 
65 

66 conj_2_list((A,B) ,L) :- 

67 conj_2_list(A,LA) , 

68 conj_2_list(B,LB) , 

69 append (LA, LB, L) . 

70 conj_2_list((A),[A]). 
71 

72 •/. 

73 blop_shell :- 

74 assert (network( [])) , blop_shell_help, blop_shell (next) . 

75 blop_shell(next) :- !, 

76 blop_shell_prompt , char_conversion( ' I ' , ' ; ' ) , 

77 read(Goal), char_conversion( ' I ' , 'I'), blop_shell (Goal) . 

78 blop_shell(exit) . 

79 blop_shell_help :- !,nl, 

80 writeC **************************************************************** ,nl, 

81 writeC * BLoP - Bayesian Logic Programs Interpreter *'),nl, 

82 writeC *****************************************************************>) ,nl, 

83 writeC * help. 1+ this message *'),nl, 

84 writeC * Q1,...QN. |+ support network for computing *')>nl, 

85 writeC * I p(Ql,...,QN) *'),nl, 

86 writeC * Ql QN|El=el EM=eM. |+ support network for computing *'),nl, 

87 writeC * I p(Ql, . . . ,QN|El=el EM=eM) *'),nl, 

88 writeC * ["height. blp" ] . 1+ consult BLP file "height. blp" *'),nl, 

89 writeC * exit. 1+ exit the shell *'),nl, 

90 writeC *****************************************************************') ,nl,nl. 

91 blop_shell(help) :- 

92 blop_shell_help, blop_shell (next) . 

93 blop_shell([X]) :- 

94 consult (X) ,! , blop_shell(next) . 

95 blop_shell((Q;E)) :-!, 

96 query ( [Q] , [E] ) , blop_shell (next ) . 

97 blop_shell(Q) :- 

98 query ( [Q] ,[]),! , blop_shell (next) . 
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99 blop_shell(_) :- !, 

100 f ormat ( ' "nUnknown command !~n~n ',[]), blop_shell(next) . 

101 blop_shell_prompt :- 

102 write ('<BLoP> ?- '), f lush_output (user_output) . 

103 

104 •/. 

105 id([Pa] , [CPD] ,Pa,CPD) . 
106 

107 •/. 

108 write_support_network(SN) : - 

109 f lush_output (user_output) , tell ( 'out .net ' ) , 

110 format ( 'net~n{~n node_size = (100 40) ; ~n}~n~n' , [] ) , 

111 write_nodes (SN) , write_potentials (SN) , told. 

112 

113 write_nodes( [] ) . 

114 write_nodes( [(N,T,D,_,_) I R] ) :- write_node(N,T,D) , write_nodes (R) . 
115 

116 write_node(N, continuous,.) :- 

117 format ('continuous node ~9~n{~n label = "~q" ; ~n}~n~n' , [write_no_brackets(N) ,N] ) . 

118 write_node(N, discrete, D) :- 

119 formatOdiscrete node ~@~n{~n states = ( "9 ) ; "n label = " ~q" ; ~n}~n~n' , 

120 [write_no_brackets(N) ,write_ddomain(D) ,N] ) . 
121 

122 write_ddomain( [] ) . 

123 write_ddomain( [D IDs] ) :- format ('""q" ' , [D] ) , write_ddomain(Ds) . 
124 

125 write_potentials ( [] ) . 

126 write_potentials( [(N, continuous ,_, Pa, CPD) |R] ) :- 

127 format ( 'potential (~9 ~9)~n{~n data = ( ~9 );~n}"n"n', 

128 [write_no_brackets (N) , write_cond(Pa) , write_continuous_cpd(CPD)] ) , 

129 write_potentials (R) . 

130 write_potentials( [(N, discrete,., Pa, CPD) I R] ) :- 

131 format ( 'potential ("0 ~@)~n{~n data = ( "9 );~n}"n"n', 

132 [write_no_brackets (N) , write_cond(Pa) , write_list (CPD)] ) , 

133 write_potentials(R) . 
134 

135 write_cond( [] ) . 

136 write_cond(P) :- format (' I ~@ ' , [write_list(P)] ) . 
137 

138 write_continuous_cpd( [] ) . 

139 write_continuous_cpd( [normal (M,V) IPs] ) :- 

140 formatC normal ("9 , ~@) ' , [write_no_brackets (M) , write_no_brackets (V) ]), 

141 write_continuous_cpd(Ps) . 
142 

143 write_list([]) . 

144 write_list([P|Ps]) :- 

145 formatC "9 ',[ write_no_brackets(P) ] ) ,write_list (Ps) . 
146 

147 write_no_brackets(T) :- 

148 write_to_chars(T,C) , delete(C,40,Cl) , delete(Cl ,41 ,C2) , 

149 substitute(44,C2,95,NT) , name(N,NT), write (N) . 



